Comparing country policies in support of public research using the STIP Compass dataset

Draft, 23 November 2021

By Gildas Ehrmann https://www.linkedin.com/in/gildasehrmann/

1. Introduction: Motivation and scope of this work

The 2021 edition of the STIP Compass database provides a wealth of information on more than 6500 policy initiatives for 57 countries and the European Union. However, the portal’s current interactive dashboards do not allow to easily compare policy choices across countries. There is nevertheless appetite from policy makers, from policy analysts to get country overviews in specific policy areas, leveraging on that rich dataset, and to have the ability to compare countries against one another.

To address this need, this notebook aims to pilot visualisations of policy initiative data for five selected OECD Member Countries (Canada, France, Germany, Japan and Korea). These countries are selected as the quality of their data is relatively higher compared to that of others in the database. Using the 2021 STIP Compass, this work develops code in Python that visualises detailed aspects about public research policies and related policy instruments. While the scope of this analysis only covers a fraction of the STI policy field, it could be replicated and extended to approach other policy areas covered by the STIP Compass database (e.g. business innovation, technology transfer).

This notebook is structured as follows. Section 2 describes how the STIP Compass dataset is downloaded, pre-processed and made ready for analysis. Section 3 provides an overview of policies reported in the public research system (PRS) policy area of the dataset across the five selected countries. Section 4 focuses on specific policy instruments in support of public research, namely institutional funding for public research, project grants for public research and dedicated support to research infrastructure. Lastly, Section 5 reflects on the outcomes and limitations of this pilot, together with possible next steps.

2. Preparing the STIP Compass dataset

This notebook works directly using the full 2021 STIP Compass database published by the STIP Data Lab. This first section retrieves the dataset from the web and conducts a few preliminary operations. In particular, the relevant data is trimmed, arranged and formatted. It produces an extract of essential columns and introduces a few variables used throughout the analysis.

2.1. Loading the dataset

A dictionary (Dict_Headers) is created from the second row of the dataset to store and save headers for later use. Some corrections are made to reduce the length of the field "Long Description". On the dataset downloaded, corrections are made on variables including cleaning the fields "InitiativeID", "InstrumentID", aggregating the CountryLabel/CountryCode for Belgian reporting entities. All variables are transformed into numerical variables when appropriate. An extra field is added to store countries' OECD status (Member or Partner). All the variables created throughout the notebook will need to be added to the dictionary (Dict_Headers). Hence a large number of variables named in the format "isbudgetedxxx" are declared in the dictionary. These will be defined later and used in the colour shading process of graphs but need to be declared in the dictionary for simplicity.

2.2. Procedure to measure initiatives and instruments weights

The STIP Compass dataset maps budgetary data for policy initiatives. More specifically, it indicates their estimated yearly budget expenditures using a series of ranges (Table 1). To consider the scale of an initiative or an instrument, the net budget weight (NBW) is computed depending on each initiative budget range, as per Table 1. As with the rest of the STIP Compass policy data, the budget range used to compute NBW is reported by countries; when missing, it takes a value of 0. The weighted number of initiatives is obtained by adding 1 to the budget weight coefficient. In this notebook, most calculations count initiatives using the weighted value, which would accordingly range from 1 to 6.



Table 1. Budget ranges and their associated weight coefficients
Budget Range
(in Million Euros)
Budget weight
coefficient
Net budget-weighted
initiatives
Don’t know / Not applicable 0 (no weight) 1
Less than 1 M 0.005 1.005
1 M – 5 M 0.01 1.01
5 M – 20 M 0.05 1.05
20 M – 50 M 0.2 1.2
50 M – 100 M 0.5 1.5
100 M – 500 M 1 2
More than 500 M 5 6

2.3. Selecting countries

The countries to include in the analysis are stored in a dataframe (my_countries). Country selection can be changed at will from the list of 38 OECD Member Countries and 20+ countries and territories participating in STIP Compass data, although the degree and quality of reporting varies across countries. In this pilot notebook, 5 OECD member countries were selected: Canada, France, Germany, Japan and Korea.

These preliminary steps mentioned above are conducted in the code cell below (click to expand/collapse).

Show Code
# Import necessary libraries:

import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots


# Import the dataset

# Note: load online stip compass data using pipe '|' separator and skipping the second header (multi-indexing causes problems in the filtering)
#url = 'https://stip.oecd.org/assets/downloads/STIP_Survey.csv '
#compass_df = pd.read_csv(url, sep='|', skiprows=[1])

compass_df = pd.read_csv(r'C:\Users\Gildas\Downloads\STIP_Survey_test_andres.csv', sep='|', encoding='UTF-8-SIG', header=0, low_memory=False)


# Prepare the dataset: Trim data/ Correct issues in format / Create dictionnaries
    # Create a dictionary of column codes with column headers
    
Dict_Headers = compass_df.head(1).T.reset_index()
Dict_Headers.loc[1] = ['Code', 'Long Description']
new_header = Dict_Headers.iloc[1]
Dict_Headers = Dict_Headers[2:] 
Dict_Headers.columns = new_header
Dict_Headers.reset_index(inplace=True, drop=False)
Dict_Headers.drop(['index'], axis=1, inplace=True)


    #Dict_Headers['Long Description'].replace(r':', r':<br>')
Dict_Headers

    # add additional data to dictionnary
new_row2 = pd.DataFrame([['F41:V31', 'Funding amount allocated for an average time-period of:Not specified'], ['Unique', 'Unique'], ['OECD_STATUS', 'OECD_STATUS'], ['InstrumentID', 'InstrumentID'], ['NBW', 'NBW'], ['Total', 'Total'], ['isbudgeted', 'isbudgeted'], ['isbudgeted28', 'isbudgeted28'], ['isbudgeted29', 'isbudgeted29'], ['isbudgeted30', 'isbudgeted30'], ['isbudgeted31', 'isbudgeted31'], ['isbudgeted33', 'isbudgeted33'], ['isbudgeted34', 'isbudgeted34'], ['isbudgeted35', 'isbudgeted35'], ['isbudgeted804', 'isbudgeted804'],
                         ['isbudgeted805', 'isbudgeted805'], ['F78:V31', 'Not specified'], ['F76:V31', 'Not specified'], ['isbudgeted37', 'isbudgeted37'], ['isbudgeted38', 'isbudgeted38'], ['isbudgeted39', 'isbudgeted39'], ['isbudgeted40', 'isbudgeted40'], ['isbudgeted41', 'isbudgeted41'], ['isbudgeted806','isbudgeted806'],
                         ['isbudgeted856', 'isbudgeted856'], ['isbudgeted857','isbudgeted857'], ['isbudgeted858','isbudgeted858'], ['isbudgeted859','isbudgeted859'], ['isbudgeted860','isbudgeted860'], ['isbudgeted861','isbudgeted861'], ['isbudgeted862','isbudgeted862'],['isbudgeted863','isbudgeted863'],['F120:V863', 'Not provided']],columns=['Code', 'Long Description'])

    # Merge the two back dataframes together
Dict_Headers = Dict_Headers.append(new_row2, ignore_index=True)

    #Trim the dictionnary
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Maximum amount of grant awarded in euros:", "", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Funding amount allocated for an average time-period of:", "", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Performance-based element to the allocation:", "", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Criteria for funding:", "", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Type of activity:", "", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Requires a form of collaboration:", "", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Selection criteria:", "", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Share of public funding \(as a % of total funding of the centre of excellence\):", "", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Requires a form of collaborative research:", "", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Promotes international mobility of students and researchers:", "", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Promotes intersectoral mobility \(e.g. between the academic and private sectors\):", "", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Objective\(s\):", "", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace(" \(e.g. technology, innovation\)", "", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace(" \(e.g. contract research, other grants\)", "", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace(" \(to promote regional or cluster policy\)", "", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace(" \(excellence\)", "", regex=True).str.strip()  
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Country name", "CountryLabel", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Country code", "CountryCode", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace(" \(e.g. contract research, competitive grants\)", "", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace(" intellectual property"," IP", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace(" \(e.g. women and other under-represented groups\)", "", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Institutional funding for public research:", "", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Dedicated support to research infrastructures:", "", regex=True).str.strip()
Dict_Headers['Long Description'] = Dict_Headers['Long Description'].str.replace("Project grants for public research:", "", regex=True).str.strip()



    # First sets of treatment on data:

compass_df['NameResponsibleOrganisation'].replace(r'', np.NaN)
compass_df.Tags.fillna("¬", inplace=True)
compass_df.YearlyBudgetRange.fillna("Don't know", inplace=True)
compass_df.drop(index=compass_df.index[0], 
        axis=0, 
        inplace=True)


    # There's a few organisations missing - for the corresponding initiatives to still be retrieved, replace NaN with 'Unknown organisation':

compass_df['NameResponsibleOrganisation'].fillna('Unknown organisation',inplace=True)

    # Shorten the InitiativeID and InstrumentID fields, removing HTTP references
compass_df['InitiativeID'] = compass_df['InitiativeID'].str.replace("http://stip.oecd.org/2021/data/policyInitiatives/", "", regex=True).str.strip()
compass_df['InstrumentID'] = compass_df['InstrumentID'].str.replace("http://stip.oecd.org/2021/data/policyInitiatives/", "", regex=True).str.strip()
compass_df['InstrumentID'] = compass_df['InstrumentID'].str.replace("instrument/", "", regex=True).str.strip()


    # The case of Belgium: unify reporting entities under one country
compass_df['CountryLabel'] = compass_df['CountryLabel'].str.replace("Belgium - Brussels Capital", "Belgium").str.strip()
compass_df['CountryLabel'] = compass_df['CountryLabel'].str.replace("Belgium - Federal government", "Belgium").str.strip()
compass_df['CountryLabel'] = compass_df['CountryLabel'].str.replace("Belgium - Flanders", "Belgium").str.strip()
compass_df['CountryLabel'] = compass_df['CountryLabel'].str.replace("Belgium - Wallonia", "Belgium").str.strip()
compass_df['CountryLabel'] = compass_df['CountryLabel'].str.replace("Belgium - Wallonia-Brussels Federation", "Belgium").str.strip()
compass_df['CountryLabel'] = compass_df['CountryLabel'].str.replace("Belgium-Brussels Federation", "Belgium").str.strip()
compass_df['CountryCode'] = compass_df['CountryCode'].str.replace("BEBRU", "BEL").str.strip()
compass_df['CountryCode'] = compass_df['CountryCode'].str.replace("BEFED", "BEL").str.strip()
compass_df['CountryCode'] = compass_df['CountryCode'].str.replace("BEVLG", "BEL").str.strip()
compass_df['CountryCode'] = compass_df['CountryCode'].str.replace("BEWAL", "BEL").str.strip()
compass_df['CountryCode'] = compass_df['CountryCode'].str.replace("BEWBF", "BEL").str.strip()
compass_df['CountryCode'] = compass_df['CountryCode'].str.replace("BEWBF", "BEL").str.strip()

    # Ensure all the data is in the right format: 

for x in range(0, compass_df.shape[1]):
        compass_df.iloc[:,x] = compass_df.iloc[:,x].astype(int, errors='ignore')

# Complement the dataset:

    # Introduce a variable called PRS to filter PRS-specific data:

compass_df_PRS = compass_df[['TH10', 'TH18', 'TH19', 'TH20', 'TH21', 'TH22', 'TH23', 'TH24', 'TH25', 'TH26', 'TH27', 'TH64']].copy()
compass_df_PRS['PRS'] = compass_df_PRS.sum(axis = 1)
compass_df['PRS'] = compass_df_PRS['PRS']

    # Add column 'OECD_STATUS' on OECD membership vs partner

compass_df.loc[(compass_df['CountryCode'] == 'AUS') | (compass_df['CountryCode'] == 'AUT') | (compass_df['CountryCode'] == 'BEBRU') | (compass_df['CountryCode'] == 'BEFED') | (compass_df['CountryCode'] == 'BEVLG') | (compass_df['CountryCode'] == 'BEWAL') | (compass_df['CountryCode'] == 'BEWBF') | (compass_df['CountryCode'] == 'BEL') | (compass_df['CountryCode'] == 'CAN') | (compass_df['CountryCode'] == 'CHL') | (compass_df['CountryCode'] == 'COL') | (compass_df['CountryCode'] == 'CRI') | (compass_df['CountryCode'] == 'CZE') | (compass_df['CountryCode'] == 'DNK') | (compass_df['CountryCode'] == 'EST') | (compass_df['CountryCode'] == 'FIN') | (compass_df['CountryCode'] == 'FRA') | (compass_df['CountryCode'] == 'DEU') | (compass_df['CountryCode'] == 'GRC') | (compass_df['CountryCode'] == 'HUN') | (compass_df['CountryCode'] == 'ISL') | (compass_df['CountryCode'] == 'IRL') | (compass_df['CountryCode'] == 'ISR') | (compass_df['CountryCode'] == 'ITA') | (compass_df['CountryCode'] == 'JPN') | (compass_df['CountryCode'] == 'KOR') | (compass_df['CountryCode'] == 'LVA') | (compass_df['CountryCode'] == 'LTU') | (compass_df['CountryCode'] == 'LUX') | (compass_df['CountryCode'] == 'MEX') | (compass_df['CountryCode'] == 'NLD') | (compass_df['CountryCode'] == 'NZL') | (compass_df['CountryCode'] == 'NOR') | (compass_df['CountryCode'] == 'POL') | (compass_df['CountryCode'] == 'PRT') | (compass_df['CountryCode'] == 'SVK') | (compass_df['CountryCode'] == 'SVN') | (compass_df['CountryCode'] == 'ESP') | (compass_df['CountryCode'] == 'SWE') | (compass_df['CountryCode'] == 'CHE') | (compass_df['CountryCode'] == 'TUR') | (compass_df['CountryCode'] == 'GBR') | (compass_df['CountryCode'] == 'USA'), 'OECD_STATUS'] = 'Member'  
compass_df.loc[(compass_df['CountryCode'] != 'AUS') & (compass_df['CountryCode'] != 'AUT') & (compass_df['CountryCode'] != 'BEBRU') & (compass_df['CountryCode'] != 'BEFED') & (compass_df['CountryCode'] != 'BEVLG') & (compass_df['CountryCode'] != 'BEWAL') & (compass_df['CountryCode'] != 'BEWBF') & (compass_df['CountryCode'] != 'BEL') & (compass_df['CountryCode'] != 'CAN') & (compass_df['CountryCode'] != 'CHL') & (compass_df['CountryCode'] != 'COL') & (compass_df['CountryCode'] != 'CRI') & (compass_df['CountryCode'] != 'CZE') & (compass_df['CountryCode'] != 'DNK') & (compass_df['CountryCode'] != 'EST') & (compass_df['CountryCode'] != 'FIN') & (compass_df['CountryCode'] != 'FRA') & (compass_df['CountryCode'] != 'DEU') & (compass_df['CountryCode'] != 'GRC') & (compass_df['CountryCode'] != 'HUN') & (compass_df['CountryCode'] != 'ISL') & (compass_df['CountryCode'] != 'IRL') & (compass_df['CountryCode'] != 'ISR') & (compass_df['CountryCode'] != 'ITA') & (compass_df['CountryCode'] != 'JPN') & (compass_df['CountryCode'] != 'KOR') & (compass_df['CountryCode'] != 'LVA') & (compass_df['CountryCode'] != 'LTU') & (compass_df['CountryCode'] != 'LUX') & (compass_df['CountryCode'] != 'MEX') & (compass_df['CountryCode'] != 'NLD') & (compass_df['CountryCode'] != 'NZL') & (compass_df['CountryCode'] != 'NOR') & (compass_df['CountryCode'] != 'POL') & (compass_df['CountryCode'] != 'PRT') & (compass_df['CountryCode'] != 'SVK') & (compass_df['CountryCode'] != 'SVN') & (compass_df['CountryCode'] != 'ESP') & (compass_df['CountryCode'] != 'SWE') & (compass_df['CountryCode'] != 'CHE') & (compass_df['CountryCode'] != 'TUR') & (compass_df['CountryCode'] != 'GBR') & (compass_df['CountryCode'] != 'USA'), 'OECD_STATUS'] = 'Partner'  


    # Generate an order and assigne a coefficient for budget brackets:

Fundinggrid={'YearlyBudgetRange':["Don't know", 'Not applicable','Less than 1M','1M-5M','5M-20M','20M-50M','50M-100M','100M-500M','More than 500M'],'coefficient':[0,0,0.005,0.01,0.05,0.2,0.5,1,5],'order':[7,8,0,1,2,3,4,5,6],'isbudgeted':[0,1,1,1,1,1,1,1,1]}
Fundinggrid=pd.DataFrame(Fundinggrid)

    # Add the budget ordering to main dataframe and compute "NBW" (net-budget-weighted) for each datafrme entry 
compass_df = pd.merge(compass_df, 
                      Fundinggrid, 
                      on ='YearlyBudgetRange', 
                      how ='inner')
compass_df.rename(columns={'Unnamed: 0': 'OriginalID'}, inplace=True)
compass_df['NBW'] = compass_df['coefficient'] + 1

    # Filter out the columns we dont need:

compass_df = compass_df[['InitiativeID', 'OECD_STATUS', 'CountryLabel', 'CountryCode', 'StartDateYear', 'EndDateYear', 'NameResponsibleOrganisation', 'InstrumentTypeLabel', 'InstrumentID', 'YearlyBudgetRange', 'YearlyBudgetLocalCurrency', 'TG20', 'TG21', 'TG22', 'TG9', 'TG11', 'TG10', 'TG38', 'TG12', 'TG13', 'F38:V4', 'F38:V5', 'F40:V7', 'F40:V8', 'F40:V9', 'F40:V10', 'F40:V11', 'F40:V12', 'F40:V14', 'F40:V13', 'F40:V15', 'F40:V16', 'F40:V17', 'F40:V18', 'F40:V668', 'F40:V800', 'F40:V801', 'F40:V669', 'F41:V28', 'F41:V29', 'F41:V30', 'F76:V33', 'F76:V34', 'F76:V35', 'F76:V804', 'F76:V805', 'F77:V37', 'F77:V38', 'F77:V39', 'F77:V40', 'F77:V806', 'F77:V41', 'F78:V43', 'F78:V44', 'F78:V45', 'F78:V46', 'F78:V807', 'F78:V808', 'F78:V47', 'F78:V48', 'F78:V49', 'F78:V50', 'F67:V145', 'F67:V146', 'F67:V147', 'F67:V148', 'F68:V150', 'F68:V839', 'F68:V151', 'F116:V823', 'F116:V824', 'F116:V825', 'F116:V826', 'F116:V827', 'F3:V687', 'F3:V830', 'F3:V90', 'F3:V91', 'F3:V92', 'F3:V93', 'F3:V94', 'F3:V688', 'F3:V95', 'F3:V96', 'F4:V98', 'F4:V99', 'F4:V100', 'F4:V101', 'F4:V689', 'F120:V856', 'F120:V857', 'F120:V858', 'F120:V859', 'F120:V860', 'F120:V861', 'F120:V862', 'PRS', 'coefficient', 'NBW', 'isbudgeted']].copy()

# Country selection: What country should the analysis be on?

    # Select your countries (CountryLabel)

my_countries = ("Korea", "Japan", "Germany", "France", "Canada")
my_countries = pd.DataFrame(my_countries)
my_countries = my_countries.rename(columns = {0: "CountryLabel"})
my_countries = my_countries.sort_values(["CountryLabel"]).copy()

    # Filter the list of countries CountryLabel/CountryCode
all_countries = compass_df[['CountryLabel', 'CountryCode']].copy()
all_countries = all_countries[['CountryLabel', 'CountryCode']].drop_duplicates().reset_index(inplace = False)
all_countries["Aggregate"] = "No"
all_countries = all_countries[['CountryLabel', 'CountryCode', 'Aggregate']]

    # Import Countrycode in my_countries - this dataframe will be called later in coordination with the main dataframe
my_countries = pd.merge(my_countries, 
                      all_countries, 
                      on ='CountryLabel', 
                      how ='inner')
new_row = pd.DataFrame([['OECD Median', 'OECD', 'Yes'], ['OECD Mean', 'OECD2', 'Yes']], columns=['CountryLabel', 'CountryCode', 'Aggregate'])
my_countries = my_countries.append(new_row, ignore_index=True)
all_countries = all_countries.append(new_row, ignore_index=True)

3. Overview of public research system policies

This section summarises the policies reported under the public research system (PRS) policy area, one of the six core areas covered by the STIP Compass dataset. Focusing the analysis on the selected five countries, it includes:

  • A table overview of the number of PRS initiatives and the percentage share of those that have funding data
  • A treemap showing the distribution of initiatives by responsible organisation
  • A scatter plot depicting the concentration of initiatives by organisation
  • A stacked bar chart of the distribution of initiatives by budget bracket
  • A bullet graph displaying the beneficiaries of PRS initiatives

3.1. Overview of PRS initiatives and funding

Table 2 displays the number of PRS initiatives reported in the 2021 edition of the EC-OECD STIP survey across the five countries. It also indicates the share of initiatives that have budget data (i.e. the initiative's estimated yearly expenditures). The selected five countries have budgetary information for the majority of policies, ranging from 79% for Korea to 93% for Japan. The degree of reporting (number of initiatives) by country varies widely, ranging from Japan reporting 28 initiatives up to Germany reporting 99 initiatives. To account for this, calculations made in this notebook are normalised by the country’s number of initiatives. Analysing this data requires a deep understanding of what the country has reported and how comprehensive and representative they are. Such an assessment would need inputs from national experts. In particular, do data points for their country as presented in the various charts below correspond to how they perceive their own country? Or does one or more of these charts misrepresent their country?

Show Code
# Data preparation:

    # Copy and filter the central repository:
compass_dfTreemap = compass_df[["OECD_STATUS", "InitiativeID", "CountryLabel", "CountryCode", "YearlyBudgetRange", "PRS"]].copy()

    # Remove non-PRS data:
index_names = compass_dfTreemap[compass_dfTreemap['PRS'] < 0.3 ].index
compass_dfTreemap = compass_dfTreemap.drop(index_names, inplace = False)

    # Generate a table summing 'InitiativeID', 'NBW' and 'coefficient' by 'YearlyBudgetRange' for each 'NameResponsibleOrganisation' in each 'CountryLabel':
compass_df_pivot = pd.pivot_table(data=compass_dfTreemap, index=['OECD_STATUS', 'CountryCode', 'CountryLabel','YearlyBudgetRange'], values=['InitiativeID'], aggfunc={'InitiativeID': pd.Series.nunique}, margins=True, margins_name='total', fill_value=0).reset_index()

    # Add details on budget
compass_df_pivot = pd.merge(compass_df_pivot, 
                      Fundinggrid, 
                      on ='YearlyBudgetRange', 
                      how ='inner')
compass_df_pivot = compass_df_pivot.sort_values(by='order')
compass_df_pivot = compass_df_pivot.drop(['coefficient', 'order'], axis=1)

compass_df_pivot['isbudgeted'] = compass_df_pivot['isbudgeted'] * compass_df_pivot['InitiativeID']

# aggregate by country, removing "YearlyBudgetRange"

compass_df_pivot = pd.pivot_table(data=compass_df_pivot, index=['OECD_STATUS', 'CountryCode', 'CountryLabel'], values=['InitiativeID', 'isbudgeted'], aggfunc={'InitiativeID': pd.Series.sum, 'isbudgeted': pd.Series.sum}, margins=True, margins_name='total', fill_value=0).reset_index()

# Get details for OECD Mean

    # Filter out non-OECD countries
compass_df_pivot_OECD = compass_df_pivot[(compass_df_pivot.OECD_STATUS == "Member")]

    # Compute median in an extra row and append the row to dataframe
new_row = {'OECD_STATUS':'Member', 'CountryLabel':'OECD Mean', 'CountryCode':'OECD2', 'InitiativeID': compass_df_pivot_OECD['InitiativeID'].mean(), 'isbudgeted': compass_df_pivot_OECD['isbudgeted'].mean()}
compass_df_pivot = compass_df_pivot.append(new_row, ignore_index=True)


# Prepare the dataframe for table inclusion:

    # Select the countries to analyse
compass_df_pivot = compass_df_pivot.loc[compass_df_pivot["CountryCode"].isin(list(my_countries["CountryCode"]))]
compass_df_pivot['InitiativeID'] = round(compass_df_pivot['InitiativeID'].astype(float),2)


    # adjust the format to integer / percentage
compass_df_pivot['InitiativeID'] = round(compass_df_pivot['InitiativeID'].astype(float),0)
compass_df_pivot['InitiativeID'] = (compass_df_pivot['InitiativeID'].astype(int))
compass_df_pivot['isbudgeted'] = (compass_df_pivot['isbudgeted']/compass_df_pivot['InitiativeID'])
compass_df_pivot = compass_df_pivot.sort_values(by='isbudgeted', ascending=False)
compass_df_pivot['isbudgeted'] = compass_df_pivot['isbudgeted'].map(lambda n: '{:,.1%}'.format(n))

    # Trim to only show total initiatives and total initiatives with budget data
compass_df_pivot = compass_df_pivot.drop(['CountryCode', 'OECD_STATUS'], axis=1)
compass_df_pivot = compass_df_pivot.rename(columns = {"CountryLabel" : "Country", "isbudgeted" : "Total initiatives with budget data", "InitiativeID" : "Total initiatives"} )


# Generate a table
import plotly.figure_factory as ff

colorscale = [[0, '#272D31'],[.5, '#ffffff'],[1, '#ffffff']]

fig =  ff.create_table(compass_df_pivot, colorscale=colorscale)
fig.layout.width=800
fig.update_layout(title_text="Table 2: Total PRS initiatives and funding rates by country",
            title_font_family="verdana",
            title_font_color='#2a3f5f',                            
            title_font_size=15)
fig.update_layout({'margin':{'t':40}})
fig.show()

3.2. Distribution of initiatives by responsible organisation

Figure 1 shows the distribution of public organisations or bodies responsible for funding or managing PRS policy initiatives. The distribution of public bodies for each country is represented by squares, each corresponding to one body. For each of them, the square size is proportional to the number of PRS initiatives reported, as a share of the country’s total. This normalisation makes each countries' total area the same, despite numbers of initiatives and budgets reported varying by country (as per Table 2). The weighted initiatives by organisation is reported in the colour scale, from light yellow (lowest NBW) to dark red (highest NBW). Public bodies with missing funding data will be reported in light yellow along those with no funding. The colour scale is computed as the logarithm of aggregate organisations' NBW value. As the weighted initiatives tends to be highly concentrated in a few public bodies, using the logarithm in the colour scale helps to better distinguish the differences in budget allocations.

Show Code
# Data preparation

    # Copy and filter the central repository:
compass_dfTreemap = compass_df[["OECD_STATUS", "InitiativeID", "CountryLabel", "CountryCode", "NameResponsibleOrganisation", "YearlyBudgetRange", "coefficient", "NBW", "PRS"]].copy()

    # Remove non-PRS data:
index_names = compass_dfTreemap[compass_dfTreemap['PRS'] < 0.3 ].index
compass_dfTreemap = compass_dfTreemap.drop(index_names, inplace = False)

    # Generate a table summing 'InitiativeID', 'NBW' and 'coefficient' by 'YearlyBudgetRange' for each 'NameResponsibleOrganisation' in each 'CountryLabel':
compass_df_pivot = pd.pivot_table(data=compass_dfTreemap, index=['OECD_STATUS', 'CountryCode', 'CountryLabel','NameResponsibleOrganisation','YearlyBudgetRange', 'InitiativeID'], values=['coefficient', 'NBW'], aggfunc={'coefficient': pd.Series.mean, 'NBW': pd.Series.mean}, margins=False, margins_name='total', fill_value=0).reset_index()
compass_df_pivot["Unique"] = 1
compass_df_pivot = compass_df_pivot.drop_duplicates()



    # Filter out non-OECD countries
compass_df_pivot = compass_df_pivot[(compass_df_pivot.OECD_STATUS == "Member")]


    # Select the countries to analyse
compass_df_pivot_filtered = compass_df_pivot.loc[compass_df_pivot["CountryLabel"].isin(list(my_countries["CountryLabel"]))]
compass_df_pivot_filtered = compass_df_pivot_filtered.reset_index(inplace = False)
compass_df_pivot_filtered = compass_df_pivot_filtered.drop(['index'], axis=1)

for x in range(0, compass_df_pivot_filtered.shape[0]):
    if compass_df_pivot_filtered["NameResponsibleOrganisation"].iloc[x] == "":
        compass_df_pivot_filtered["NameResponsibleOrganisation"] = "Organisation not provided"
            
            
    # Split responsible organisation fields when multiple organisation flagged:
test = compass_df_pivot_filtered[["NameResponsibleOrganisation","InitiativeID"]].copy()
test2 = test[['NameResponsibleOrganisation']].applymap(lambda x: str.count(x, '¬'))
test3 = test2['NameResponsibleOrganisation'].max()

test['total_org'] = test3
for x in range(0, test3):
    y = str(x)
    test['Name'+y] = test['NameResponsibleOrganisation'].str.split('¬', expand=True)[x]
for z in range(0, test.shape[0]):
    for x in range(0, test3):
        y = str(x)
        if test['Name'+y].loc[z] is None:
            test['total_org'].iloc[z] = test['total_org'].iloc[z] - 1
test = test.drop(['NameResponsibleOrganisation'], axis=1)

compass_df_pivot_filtered = pd.merge(compass_df_pivot_filtered, 
                      test, 
                      on ='InitiativeID', 
                      how ='inner')
compass_df_pivot_filtered['Unique'] = compass_df_pivot_filtered['Unique']# / compass_df_pivot_filtered['total_org']
compass_df_pivot_filtered['NBW'] = compass_df_pivot_filtered['NBW']# / compass_df_pivot_filtered['total_org']
compass_df_pivot_filtered = compass_df_pivot_filtered.drop(['total_org'], axis=1)

new_header= list(compass_df_pivot_filtered.columns[9:].values)
compass_df_pivot_filtered = compass_df_pivot_filtered.melt(id_vars=['OECD_STATUS', 'CountryCode', 'CountryLabel', 'YearlyBudgetRange', 'Unique', 'NBW', 'coefficient'], value_vars=new_header, var_name='Variable') #, value_name='InitiativeID')
compass_df_pivot_filtered = compass_df_pivot_filtered.drop(['Variable'], axis=1)
compass_df_pivot_filtered = compass_df_pivot_filtered.rename(columns = {"value" : "NameResponsibleOrganisation"})
compass_df_pivot_filtered = compass_df_pivot_filtered.dropna()

    # Generate a table summing 'Unique', 'NBW' and 'coefficient' by 'YearlyBudgetRange' for each 'NameResponsibleOrganisation' in each 'CountryLabel':
compass_df_pivot_filtered = pd.pivot_table(data=compass_df_pivot_filtered, index=['OECD_STATUS', 'CountryCode', 'CountryLabel','NameResponsibleOrganisation','YearlyBudgetRange'], values=['Unique', 'NBW'], aggfunc={'Unique': sum, 'NBW': sum}, margins=True, margins_name='total', fill_value=0).reset_index()
compass_df_pivot_filtered = compass_df_pivot_filtered.rename(columns = {"Unique" : "InitiativeID"})

compass_df_pivot_filtered.dtypes

# Generate the graph

    # Select data for the graph
levels = ['NameResponsibleOrganisation'] # levels used for the hierarchical chart
color_columns = ['InitiativeID', 'NBW']
value_column = ['InitiativeID']

    # Lets use a dictionary to generate x dataframes
my_dict = {}
for x in range(0, my_countries[my_countries.Aggregate == "No"].shape[0]):
    my_dict[x] = "df_all_trees_" + str(x)

    # Generate the graph for each country
    
# Introduce the function "build_hierarchical_dataframe2"

def build_hierarchical_dataframe2(compass_df_pivot_filtered, levels, value_column, color_columns):
    """
    Build a hierarchy of levels for Treemap charts.
    Levels are given starting from the bottom to the top of the hierarchy,
    ie the last level corresponds to the root.
    """
    my_dict[x] = pd.DataFrame(columns=['id', 'parent', 'value', 'color'])
    for i, level in enumerate(levels):
        df_tree = pd.DataFrame(columns=['id', 'parent', 'value', 'color'])
        dfg = compass_df_pivot_filtered[compass_df_pivot_filtered.CountryCode == my_countries.CountryCode[x]].groupby(levels[i:], sort=False).sum()
        dfg = dfg.reset_index()
        df_tree['id'] = dfg[level].copy()
        if i < len(levels) - 1:
            df_tree['parent'] = dfg[levels[i+1]].copy()
        else:
            df_tree['parent'] = my_countries[my_countries.Aggregate == "No"].CountryLabel[x]
        df_tree['value'] = dfg[value_column]
        df_tree['color'] = dfg[color_columns[1]] #/ dfg[color_columns[0]]
        my_dict[x] = my_dict[x].append(df_tree, ignore_index=True)
    total = pd.Series(dict(id=my_countries[my_countries.Aggregate == "No"].CountryLabel[x], parent='',
                              value=compass_df_pivot_filtered[value_column].sum(),
                              #color=(compass_df_pivot_filtered[color_columns[0].sum()
                              color=(compass_df_pivot_filtered[color_columns[1]] / compass_df_pivot_filtered[color_columns[0]]).sum()))
    my_dict[x] = my_dict[x].append(total, ignore_index=True)
    return my_dict[x]

for x in range(0, my_countries[my_countries.Aggregate == "No"].shape[0]):
    my_dict[x] = build_hierarchical_dataframe2(compass_df_pivot_filtered, levels, value_column, color_columns)
    
fig = make_subplots(1, my_countries[my_countries.Aggregate == "No"].shape[0], specs=[[{"type": "domain"}, {"type": "domain"}, {"type": "domain"}, {"type": "domain"}, {"type": "domain"}]],)

for x in range(0, my_countries[my_countries.Aggregate == "No"].shape[0]):
    
    my_dict[x]['color'] = np.log(my_dict[x]['color'])
    
    fig.add_trace(go.Treemap(
        labels=my_dict[x]['id'], #[:100],
        parents=my_dict[x]['parent'],
        values=my_dict[x]['value'],
        branchvalues='total',
        marker=dict(
            colors=my_dict[x]['color'],
            colorscale='YlOrRd',
            showscale=False),
        hovertemplate='</b> Initiatives: %{value}<br> log(NBW): %{color:.3f}<br>',
        maxdepth=5,
        textfont=dict(size=15),# color=df_all_trees['color_two']),
        name=''
        ), 1, x+1)

#fig.update_traces(showscale=False)

fig.update_layout(margin = dict(t=100, l=0, r=0, b=0), title_text="Figure 1: Distribution of STI public bodies (by initiatives and their budget)",
                  title_font_size=17, title_font_family="verdana")
fig.show()

Figure 1 shows that Germany and Korea and to some extent France have a very large organisation centralising PRS initiatives and funding. By comparison, Japan's and Canada’s PRS initiatives and funding are less concentrated. The figure also shows that Canada, Germany and France have a larger number of public bodies in the PRS policy landscape, compared to Japan.

3.3. Concentration of PRS initiatives by organisation

The scatter plot in Figure 2 displays the average number of initiatives that each responsible public body manages within a given country, against the total number of bodies in the country. The figure gives additional perspective on the concentration of PRS initiatives for the five selected countries compared to the rest of OECD countries. It depicts the heterogeneity of reporting across countries of PRS initiatives, with responsible organisations reportedly in charge of managing such policies ranging from 3 to 41. Countries in the lower end of this distribution most likely do not have a sufficient level of reported for the data to be representative.

Show Code
# Copy and filter the central repository:
compass_df_pivot = compass_df[["OECD_STATUS", "InitiativeID", "CountryLabel", "CountryCode", "NameResponsibleOrganisation", "YearlyBudgetRange", "coefficient", "NBW", "PRS"]].copy()

# Remove non-PRS data:
index_names = compass_df_pivot[compass_df_pivot['PRS'] < 0.3 ].index
compass_df_pivot = compass_df_pivot.drop(index_names, inplace = False)

# Generate a table summing 'InitiativeID', 'NBW' and 'coefficient' by 'YearlyBudgetRange' for each 'NameResponsibleOrganisation' in each 'CountryLabel':
compass_df_pivot = pd.pivot_table(data=compass_df_pivot, index=['OECD_STATUS', 'CountryCode', 'CountryLabel','NameResponsibleOrganisation','YearlyBudgetRange', 'InitiativeID'], values=['coefficient', 'NBW'], aggfunc={'coefficient': pd.Series.mean, 'NBW': pd.Series.mean}, margins=False, margins_name='total', fill_value=0).reset_index()
compass_df_pivot["Unique"] = 1
compass_df_pivot = compass_df_pivot.drop_duplicates()

# Filter out non-OECD countries
compass_df_pivot = compass_df_pivot[(compass_df_pivot.OECD_STATUS == "Member")]

# Split responsible organisation fields when multiple organisation flagged:
    
test = compass_df_pivot[["NameResponsibleOrganisation","InitiativeID"]].copy()
test2 = test[['NameResponsibleOrganisation']].applymap(lambda x: str.count(x, '¬'))
test3 = test2['NameResponsibleOrganisation'].max()
test['total_org'] = test3

for x in range(0, test3):
    y = str(x)
    test['Name'+y] = test['NameResponsibleOrganisation'].str.split('¬', expand=True)[x]
for z in range(0, test.shape[0]):
    for x in range(0, test3):
        y = str(x)
        if test['Name'+y].loc[z] is None:
            test['total_org'].iloc[z] = test['total_org'].iloc[z] - 1
test = test.drop(['NameResponsibleOrganisation'], axis=1)

# Copy and filter the central repository:
compass_df_pivot = compass_df[["OECD_STATUS", "InitiativeID", "CountryLabel", "CountryCode", "NameResponsibleOrganisation", "YearlyBudgetRange", "coefficient", "NBW", "PRS"]].copy()

# Remove non-PRS data:
index_names = compass_df_pivot[compass_df_pivot['PRS'] < 0.3 ].index
compass_df_pivot = compass_df_pivot.drop(index_names, inplace = False)

# Copy and filter the central repository:
compass_df_pivot = compass_df[["OECD_STATUS", "InitiativeID", "CountryLabel", "CountryCode", "NameResponsibleOrganisation", "YearlyBudgetRange", "coefficient", "NBW", "PRS"]].copy()

# Remove non-PRS data:
index_names = compass_df_pivot[compass_df_pivot['PRS'] < 0.3 ].index
compass_df_pivot = compass_df_pivot.drop(index_names, inplace = False)

# Generate a table summing 'InitiativeID', 'NBW' and 'coefficient' by 'YearlyBudgetRange' for each 'NameResponsibleOrganisation' in each 'CountryLabel':
compass_df_pivot = pd.pivot_table(data=compass_df_pivot, index=['OECD_STATUS', 'CountryCode', 'CountryLabel','NameResponsibleOrganisation','YearlyBudgetRange', 'InitiativeID'], values=['coefficient', 'NBW'], aggfunc={'coefficient': pd.Series.mean, 'NBW': pd.Series.mean}, margins=False, margins_name='total', fill_value=0).reset_index()
compass_df_pivot["Unique"] = 1
compass_df_pivot = compass_df_pivot.drop_duplicates()

# Filter out non-OECD countries
compass_df_pivot = compass_df_pivot[(compass_df_pivot.OECD_STATUS == "Member")]

# Split responsible organisation fields when multiple organisation flagged:
    
test = compass_df_pivot[["NameResponsibleOrganisation","InitiativeID"]].copy()
test2 = test[['NameResponsibleOrganisation']].applymap(lambda x: str.count(x, '¬'))
test3 = test2['NameResponsibleOrganisation'].max()
test['total_org'] = test3

for x in range(0, test3):
    y = str(x)
    test['Name'+y] = test['NameResponsibleOrganisation'].str.split('¬', expand=True)[x]
for z in range(0, test.shape[0]):
    for x in range(0, test3):
        y = str(x)
        if test['Name'+y].loc[z] is None:
            test['total_org'].iloc[z] = test['total_org'].iloc[z] - 1
test = test.drop(['NameResponsibleOrganisation'], axis=1)

compass_df_pivot = pd.merge(compass_df_pivot, 
                      test, 
                      on ='InitiativeID', 
                      how ='inner')
#compass_df_pivot['Unique'] = compass_df_pivot['Unique'] / compass_df_pivot['total_org']
compass_df_pivot = compass_df_pivot.drop(['total_org'], axis=1)

new_header= list(compass_df_pivot.columns[9:].values)
compass_df_pivot = compass_df_pivot.melt(id_vars=['OECD_STATUS', 'CountryCode', 'CountryLabel', 'YearlyBudgetRange', 'Unique', 'InitiativeID','NBW', 'coefficient'], value_vars=new_header, var_name='Variable') #, value_name='InitiativeID')
compass_df_pivot = compass_df_pivot.drop(['Variable'], axis=1)
compass_df_pivot = compass_df_pivot.rename(columns = {"value" : "NameResponsibleOrganisation"})
compass_df_pivot = compass_df_pivot.dropna()

# Data preparation:

    # First retrieve a table with count of initiatives per single organisation for each country 
compass_df_table = pd.pivot_table(data=compass_df_pivot, index=['CountryCode', 'CountryLabel', 'NameResponsibleOrganisation'], values=['Unique'], aggfunc={'Unique': pd.Series.sum}, margins=True, margins_name='total', fill_value=0).reset_index()
compass_df_table.head()


    # Remove the data not related to a country (OECD, total, ..)
compass_df_table = compass_df_table[compass_df_table.CountryLabel != ""]

    # Generate a second table by country, summing up the count of organisation and the average number of initiative by organisation for each country:
compass_df_table = pd.pivot_table(data=compass_df_table, index=['CountryCode', 'CountryLabel'], values=['Unique', 'NameResponsibleOrganisation'], aggfunc={'Unique': pd.Series.mean, 'NameResponsibleOrganisation': pd.Series.nunique}, margins=True, margins_name='total', fill_value=0).reset_index()


    # Compute OECD Median
new_row = {'CountryCode':'OECD', 'CountryLabel':'OECD Median', 'Unique': compass_df_table['Unique'].median(), 'NameResponsibleOrganisation': compass_df_table['NameResponsibleOrganisation'].median()}
    # Append the OECD data to the dataframes
compass_df_table = compass_df_table.append(new_row, ignore_index=True)

    # Remove the data not related to a country (total, ..)
compass_df_table = compass_df_table[compass_df_table.CountryLabel != ""]

    # Add a filter on total initiatived by country to filter out outliers - ex: 15 initiatives
compass_df_table['TotalInit'] = compass_df_table['Unique'] * compass_df_table['NameResponsibleOrganisation']

    # Drop countries below 15 Initiatives
compass_df_table = compass_df_table[compass_df_table.TotalInit >= 15]
#compass_df_table = compass_df_table[compass_df_table.CountryLabel != "Italy"]


    # Introduce CountryLabel2 which will take the value of CountryLabel for the country we want to analyse or "" otherwise
compass_df_table['CountryLabel2'] = compass_df_table['CountryLabel'].isin(list(my_countries.CountryLabel))
compass_df_table.loc[compass_df_table['CountryLabel2'] == True, 'CountryLabel2'] = compass_df_table['CountryLabel']
compass_df_table.loc[compass_df_table['CountryLabel2'] == False, 'CountryLabel2'] = ""

    # Round the average number of initiatives per organisation:
compass_df_table['Unique'] = round(compass_df_table['Unique'],2) 

# Generate the graph:

import plotly.graph_objects as go

Unique=compass_df_table["Unique"],
NameResponsibleOrganisation=compass_df_table["NameResponsibleOrganisation"],

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=compass_df_table["Unique"],
    y=compass_df_table["NameResponsibleOrganisation"],
    text="",
    name="",
    textposition='top center',
    mode='markers+text',
    marker=dict(
        color='rgba(156, 165, 196, 0.95)',
    )
))

fig.add_trace(go.Scatter(
    x=compass_df_table[compass_df_table.CountryLabel2 != ""].Unique,
    y=compass_df_table[compass_df_table.CountryLabel2 != ""].NameResponsibleOrganisation,
    text=compass_df_table[compass_df_table.CountryLabel2 != ""].CountryLabel2,
    name="", 
    textposition='top center',
    mode='markers+text',
    marker=dict(
        color='rgba(255, 0, 0, 0.95)',
    )
))

fig.update_layout(
    title={
        'text': "Figure 2: Average distribution of public research system initiatives by organisation",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
)

fig.update_xaxes(
    title_text='Average initiatives per organisation',
    tickvals=[1,2,3,4,5,6,7,8,9],
    rangemode='tozero',
    zeroline=True
)
fig.update_layout(showlegend=False)

fig.update_yaxes(
    title_text='Number of organisations',
    rangemode='tozero',
    zeroline=True
)
fig.show()

Canada, Germany and France have over twice as many organisations involved in PRS initiatives than OECD median. Out of the five selected countries, Canada has the largest average number of initiatives per organisation, followed by Germany. France displays a lower average of average initiatives per organisation compared to the five selected countries and the OECD median. Korea and Japan are closer to the median, both in terms of number of organisations and average initiatives per organisation.

3.4. Distribution of PRS initiatives by budget bracket

Figure 3 shows countries PRS policy initiatives funding profile. It represents the distribution of initiatives by budget range for selected countries and the OECD median, as a % of total initiatives within the country/median. This figure excludes initiatives without budgetary data, whether the information is missing or whether the initiative is reported not to have any budget.

Show Code
# Data preparation:

    # Filter on data needed and aggregate "dont know" and "not applicable" budget categories
compass_df_comparison = compass_df[["OECD_STATUS", "InitiativeID", "CountryCode", "NameResponsibleOrganisation", "YearlyBudgetRange", "coefficient", "NBW", "PRS"]].copy()
compass_df_comparison['YearlyBudgetRange'] = compass_df_comparison['YearlyBudgetRange'].replace(["Don't know"],'Not applicable')

    # Remove non-PRS initiatives
index_names = compass_df_comparison[compass_df_comparison['PRS'] < 1 ].index
compass_df_comparison = compass_df_comparison.drop(index_names, inplace = False)

# Complement with OECD data:
    # Generate a dataframe filtered on OECD member only:
compass_df_pivot3a = pd.pivot_table(data=compass_df_comparison[compass_df_comparison["OECD_STATUS"] == "Member"], index=['YearlyBudgetRange'], values=['InitiativeID'], columns=['CountryCode'], aggfunc={'InitiativeID': pd.Series.nunique}, margins_name='total', fill_value=0).reset_index()
compass_df_pivot3a.columns = compass_df_pivot3a.columns.droplevel(-2)

compass_df_pivot3a = compass_df_pivot3a.rename(columns = {"" : "YearlyBudgetRange"})
compass_df_pivot3b = compass_df_pivot3a.iloc[:,1:]

compass_df_pivot3a['OECD'] = compass_df_pivot3b.median(axis=1)
compass_df_pivot3a['OECD2'] = compass_df_pivot3b.mean(axis=1)

# Complement with OECD data:


    # Retrieve the pivot table for all countries
compass_df_pivot3 = pd.pivot_table(data=compass_df_comparison, index=['YearlyBudgetRange'], values=['InitiativeID'], columns=['CountryCode'], aggfunc={'InitiativeID': pd.Series.nunique}, margins_name='total', fill_value=0).reset_index()
compass_df_pivot3.columns = compass_df_pivot3.columns.droplevel(-2)

    # Add OECD Median/Mean
compass_df_pivot3['OECD'] = compass_df_pivot3a['OECD']
#compass_df_pivot3['OECD2'] = compass_df_pivot3a['OECD2']



    # Format the table
compass_df_pivot3 = compass_df_pivot3.rename(columns = {"" : "YearlyBudgetRange"})
compass_df_pivot3 = pd.merge(compass_df_pivot3, 
                      Fundinggrid, 
                      on ='YearlyBudgetRange', 
                      how ='inner')
compass_df_pivot3 = compass_df_pivot3.sort_values(by='order')
compass_df_pivot3 = compass_df_pivot3.drop(['coefficient', 'order'], axis=1)

# Prepare the dataframe for graph inclusion:
    # Optional: remove "Not applicable"
compass_df_pivot3 = compass_df_pivot3[compass_df_pivot3.YearlyBudgetRange != "Not applicable"].copy()



    # Compute proportion of total initiatives
for x in range(1, compass_df_pivot3.shape[1]):
       compass_df_pivot3.iloc[:,x] = compass_df_pivot3.iloc[:,x]/ compass_df_pivot3.iloc[:,x].sum()

    # Transpose the data for the stacked bar graph by country
compass_df_pivot3 = compass_df_pivot3.T
new_header = compass_df_pivot3.iloc[0]
compass_df_pivot3 = compass_df_pivot3[1:] 
compass_df_pivot3.columns = new_header
compass_df_pivot3 = compass_df_pivot3 * 100




    # Format the column name
compass_df_pivot3.index = compass_df_pivot3.index.set_names('YearlyBudgetRange')
compass_df_pivot3 = compass_df_pivot3.reset_index()
compass_df_pivot3.rename(columns = {"YearlyBudgetRange":"CountryCode"}, inplace="False")

    # Filter on my_countries and add back CountryLabel
compass_df_pivot3 = pd.merge(compass_df_pivot3, 
                      all_countries, 
                      on ='CountryCode', 
                      how ='inner')
compass_df_pivot3 = compass_df_pivot3.drop(['CountryCode', 'Aggregate'], axis=1)
compass_df_pivot3 = pd.merge(compass_df_pivot3, 
                      my_countries, 
                      on ='CountryLabel', 
                      how ='inner')

# Generate the graph:

import pygal

    # Setup HTML for Pygal graphs
%matplotlib inline
from IPython.display import SVG, HTML

html_pygal = """
<!DOCTYPE html>
<html>
  <head>
  <script type="text/javascript" src="http://kozea.github.com/pygal.js/javascripts/svg.jquery.js"></script>
  <script type="text/javascript" src="http://kozea.github.com/pygal.js/javascripts/pygal-tooltips.js"></script>
    <!-- ... -->
  </head>
  <body>
    <figure>
      {pygal_render}
    </figure>
  </body>
</html>
"""


    #define the color code
from bokeh.palettes import YlOrRd
colors = YlOrRd[9]
colors

def Reverse(tuples):
        new_tup = tuples[::-1]
        return new_tup
colors = Reverse(colors)



from pygal.style import Style
custom_style = Style(
  background='transparent',
  plot_background='transparent',
  foreground='#000000', #53E89B',
  foreground_strong='#000000', #53A0E8',
  foreground_subtle='#630C0D',
  opacity='.6',
  opacity_hover='.9',
  transition='400ms ease-in',
  colors=colors)


line_chart = pygal.StackedBar(style=custom_style)

line_chart.title = 'Figure 3: Distribution of initiatives by budget bracket (in EUR)'
line_chart.x_labels = compass_df_pivot3['CountryLabel']
line_chart.y_labels = [
  {'label': '10%', 'value': 10},
  {'label': '20%', 'value': 20},
  {'label': '30%', 'value': 30},
  {'label': '40%', 'value': 40},
  {'label': '50%', 'value': 50},
  {'label': '60%', 'value': 60},
  {'label': '70%', 'value': 70},
  {'label': '80%', 'value': 80},
  {'label': '90%', 'value': 90},
  {'label': '100%', 'value': 100}]

for x in range(0, compass_df_pivot3.shape[1]-3):
    line_chart.add(compass_df_pivot3.iloc[:,x].name, compass_df_pivot3.iloc[:,x].astype(float))

HTML(html_pygal.format(pygal_render=line_chart.render()))
Figure 3: Distribution of initiatives by budget bracket (in EUR)10%1020%2030%3040%4050%5060%6070%7080%8090%90100%100CanadaGermanyFranceJapanKoreaOECD Median9.61538461558.26410256410257480.3542899408284Canada12.32876712152.23846153846154473.6491043203372Germany15.38461538246.2128205128205466.0976331360947France5.555555556340.18717948717944490.38675213675214Japan0434.1615384615384504.11538461538464Korea20.51282051528.1358974358974453.4250493096647OECD Median17.3076923158.26410256410257413.823224852071Canada26.02739726152.23846153846154378.8651211801897Germany30.76923077246.2128205128205352.0443786982249France16.66666667340.18717948717944435.47222222222223Japan4.761904762434.1615384615384492.3479853479854Korea20.51282051528.1358974358974352.04437869822493OECD Median21.1538461558.26410256410257318.7788461538462Canada24.65753425152.23846153846154253.6148577449948Germany11.53846154246.2128205128205247.49556213017755France22.22222222340.18717948717944339.3717948717949Japan9.523809524434.1615384615384457.04578754578756Korea20.51282051528.1358974358974250.66370808678508OECD Median17.3076923158.26410256410257223.7344674556213Canada8.219178082152.23846153846154172.37144362486836Germany15.38461538246.2128205128205180.96449704142015France11.11111111340.18717948717944257.00000000000006Japan0434.1615384615384433.510989010989Korea15.38461538528.1358974358974161.9556213017752OECD Median9.61538461558.26410256410257157.20340236686394Canada15.06849315152.23846153846154114.8240252897788Germany3.846153846246.2128205128205133.44230769230774France16.66666667340.18717948717944188.35683760683764Japan4.761904762434.1615384615384421.7435897435897Korea7.692307692528.1358974358974104.92899408284032OECD Median15.3846153858.2641025641025795.42455621301778Canada5.479452055152.2384615384615464.0468914646998Germany3.846153846246.2128205128205114.4334319526628France22.22222222340.1871794871794492.25641025641025Japan23.80952381434.1615384615384351.13919413919416Korea10.25641026528.135897435897460.57495069033544OECD Median9.61538461558.2641025641025733.64571005917165Canada8.219178082152.2384615384615430.195468914647137Germany19.23076923246.212820512820557.40680473372791France5.555555556340.1871794871794423.613247863247835Japan57.14285714434.1615384615384151.09340659340666Korea5.128205128528.135897435897422.557199211045514OECD MedianFigure 3: Distribution of initiatives by budget bracket (in EUR)Less than 1M1M-5M5M-20M20M-50M50M-100M100M-500MMore than 500M'

Figure 3 allows comparing the funding profile of the five selected countries and the OECD median. In Canada and Japan, more than half of the initiatives have a yearly budget expenditure of 20 million EUR or more. By contrast, in Germany and France, more than half of reported initiatives have an annual budget of up to 20 million EUR or less. These two countries follow a pattern similar to the OECD median, though France has a larger share of initiatives exceeding 500 million EUR. Korea appears to be an outlier with more than 80% of initiatives reported with a budget over 100M EUR. Data for this country should be further inspected to understand this funding profile.

3.5. Target groups of PRS initiatives

The final graph in the overview of PRS policy initiatives focuses on the target groups or the direct recipients of funding (Figure 5). Initiatives generally target individual researchers (including primarily established researchers, postdocs and researchers), organisations (public research institutes or higher education institutes), or a mix of both. To this end, a value is assigned for each initiative: 0 when targeting individual researchers only, 1 when targeting organisations exclusively, and 0.5 when directed towards a mix of both. An aggregate ratio by country is then computed by aggregating these values and the associated initiatives’ budget coefficients, to summarise each country's tendency in terms of these two types of beneficiaries. The colour shading indicates the extent to which the reported initiatives have budget information, with dark red signalling funding details available for all, and light-yellow evidencing largely missing funding details. Finally, the total number of initiatives is indicated next to each country label.

Show Code
# Data preparation:
    # Select data from central dataframe, and summarise target type in two variables, TargetOrganisations and TargetIndividuals 
compass_df_TargetFunding = compass_df.copy()
compass_df_TargetFunding['TargetOrganisations'] = compass_df_TargetFunding['TG20'] + compass_df_TargetFunding['TG21'] + compass_df_TargetFunding['TG22']
compass_df_TargetFunding['TargetIndividuals'] = compass_df_TargetFunding['TG9'] + compass_df_TargetFunding['TG10'] + compass_df_TargetFunding['TG11'] + compass_df_TargetFunding['TG12'] + compass_df_TargetFunding['TG13'] + compass_df_TargetFunding['TG38']
compass_df_TargetFunding['TargetOrganisations'] = compass_df_TargetFunding['TargetOrganisations'].astype(float)
compass_df_TargetFunding['TargetIndividuals'] = compass_df_TargetFunding['TargetIndividuals'].astype(float)

    # Remove non PRS data
index_names = compass_df_TargetFunding[compass_df_TargetFunding['PRS'] < 0.3 ].index
compass_df_TargetFunding = compass_df_TargetFunding.drop(index_names, inplace = False)

    # Compute a NBW-weighted value for both TargetOrganisations and TargetIndividuals
compass_df_TargetFunding['TargetOrganisations'] = (compass_df_TargetFunding['TargetOrganisations']/compass_df_TargetFunding['TargetOrganisations']*compass_df_TargetFunding['NBW']).astype(float)
compass_df_TargetFunding['TargetIndividuals'] = (compass_df_TargetFunding['TargetIndividuals']/compass_df_TargetFunding['TargetIndividuals']*compass_df_TargetFunding['NBW']).astype(float)
compass_df_TargetFunding['Unique'] = 1

    # Generate a table listing initiatives by country
compass_df_TargetFundingPivot = pd.pivot_table(data=compass_df_TargetFunding, index=["OECD_STATUS", 'CountryLabel', 'CountryCode', 'InitiativeID'], values=['isbudgeted','TargetOrganisations','TargetIndividuals','Unique'], aggfunc={'isbudgeted': pd.Series.mean, 'TargetOrganisations': sum, 'TargetIndividuals': sum, 'Unique': sum}, margins=True, margins_name='total', fill_value=0).reset_index()

    # Divide TargetIndividuals and TargetOrganisations by Unique to ensure Initiatives with multiple entries are not double-counted: 
compass_df_TargetFundingPivot["TargetIndividuals"] = compass_df_TargetFundingPivot["TargetIndividuals"] / compass_df_TargetFundingPivot["Unique"]
compass_df_TargetFundingPivot["TargetOrganisations"] = compass_df_TargetFundingPivot["TargetOrganisations"] / compass_df_TargetFundingPivot["Unique"]
compass_df_TargetFundingPivot = compass_df_TargetFundingPivot.fillna(0)

    # Introduce a third variable "BOTHINDORG" for initiatives targeting both individuals and organisations
        # Note: All three binaries (TargetIndividuals, TargetOrganisations,BothINDORG) must be mutually exclusive
compass_df_TargetFundingPivot['BothINDORG'] = np.where(compass_df_TargetFundingPivot['TargetIndividuals']!= compass_df_TargetFundingPivot['TargetOrganisations'], 0, compass_df_TargetFundingPivot['TargetOrganisations'])
compass_df_TargetFundingPivot['TargetIndividuals'] = compass_df_TargetFundingPivot['TargetIndividuals'] - compass_df_TargetFundingPivot['BothINDORG']
compass_df_TargetFundingPivot['TargetOrganisations'] = compass_df_TargetFundingPivot['TargetOrganisations'] - compass_df_TargetFundingPivot['BothINDORG']
compass_df_TargetFundingPivot = compass_df_TargetFundingPivot.drop(['Unique'], axis = 1)
        # Note: We now have a table listing for each initiatives, the NBW by TargetIndividuals, TargetOrganisations and BothINDORG

    # Generate a table summing NBW for each variables by country
compass_df_TargetFundingPivot2 = pd.pivot_table(data=compass_df_TargetFundingPivot, index=["OECD_STATUS", 'CountryLabel', 'CountryCode'], values=['isbudgeted', 'InitiativeID', 'TargetOrganisations','TargetIndividuals', 'BothINDORG'], aggfunc={'isbudgeted': pd.Series.mean, 'InitiativeID':pd.Series.nunique,'TargetOrganisations': sum, 'TargetIndividuals': sum, 'BothINDORG': sum}, margins=True, fill_value=0).reset_index()

# Complement with OECD data:
    # Compute OECD Median from dataframe obtained
compass_df_TargetFundingPivot2OECD = compass_df_TargetFundingPivot2.copy()

    # Select only OECD members
index_names = compass_df_TargetFundingPivot2OECD[(compass_df_TargetFundingPivot2OECD['OECD_STATUS'] != 'Member')].index
compass_df_TargetFundingPivot2OECD.drop(index_names, inplace = True)

    # Compute median in an extra row and append the row to dataframe
new_row = {'OECD_STATUS':'Member', 'CountryLabel':'OECD Median', 'CountryCode':'OECD', 'BothINDORG': compass_df_TargetFundingPivot2OECD['BothINDORG'].median(), 'TargetIndividuals': compass_df_TargetFundingPivot2OECD['TargetIndividuals'].median(), 'TargetOrganisations': compass_df_TargetFundingPivot2OECD['TargetOrganisations'].median(), 'isbudgeted':compass_df_TargetFundingPivot2OECD['isbudgeted'].median(), 'InitiativeID':compass_df_TargetFundingPivot2OECD['InitiativeID'].median()}
compass_df_TargetFundingPivot2 = compass_df_TargetFundingPivot2.append(new_row, ignore_index=True)

# generate alternative country label with funded initiative / total initiative
compass_df_TargetFundingPivot_Label = compass_df_TargetFundingPivot2[['CountryLabel', 'InitiativeID', 'isbudgeted']].copy()
compass_df_TargetFundingPivot_Label['isbudgeted'] = compass_df_TargetFundingPivot_Label['isbudgeted'] * compass_df_TargetFundingPivot_Label['InitiativeID']
compass_df_TargetFundingPivot_Label['isbudgeted'] = compass_df_TargetFundingPivot_Label['isbudgeted'].astype(int)
compass_df_TargetFundingPivot_Label['InitiativeID'] = compass_df_TargetFundingPivot_Label['InitiativeID'].astype(int)

compass_df_TargetFundingPivot_Label['CountryLabel2'] = compass_df_TargetFundingPivot_Label['CountryLabel'] + " (" + compass_df_TargetFundingPivot_Label['isbudgeted'].astype(str) + " / " + compass_df_TargetFundingPivot_Label['InitiativeID'].astype(str) + ") "
compass_df_TargetFundingPivot_Label['CountryLabel2'] = compass_df_TargetFundingPivot_Label['CountryLabel'] + " (" + compass_df_TargetFundingPivot_Label['InitiativeID'].astype(str) + ") "
compass_df_TargetFundingPivot_Label = compass_df_TargetFundingPivot_Label.drop(['InitiativeID', 'isbudgeted'], axis=1).copy()


# Prepare the dataframe for graph inclusion:
    # Reorder the columns to have variables on the right:
compass_df_TargetFundingPivot2 = compass_df_TargetFundingPivot2.reindex(columns=['OECD_STATUS', 'CountryLabel', 'CountryCode', 'isbudgeted', 'BothINDORG', 'TargetIndividuals', 'TargetOrganisations'])

    # Introduce a variable 'Total' that sums all the initiatives
compass_df_TargetFundingPivot2['Total'] = 0
for x in range(4, compass_df_TargetFundingPivot2.shape[1]-1):
       compass_df_TargetFundingPivot2['Total'] = compass_df_TargetFundingPivot2['Total'] + compass_df_TargetFundingPivot2.iloc[:,x]

    # Obtain the data in percent instead:
for x in range(4, compass_df_TargetFundingPivot2.shape[1]-1):
       compass_df_TargetFundingPivot2.iloc[:,x] = compass_df_TargetFundingPivot2.iloc[:,x]/ compass_df_TargetFundingPivot2['Total']

    # Format the percentages obtained
for x in range(4, compass_df_TargetFundingPivot2.shape[1]-1):
        compass_df_TargetFundingPivot2.iloc[:,x] = pd.Series(["{0:.5f}".format(val) for val in compass_df_TargetFundingPivot2.iloc[:,x]], index = compass_df_TargetFundingPivot2.index)
        
    # Format the data as float for display and computation of ratio
for x in range(3, compass_df_TargetFundingPivot2.shape[1]):
        compass_df_TargetFundingPivot2.iloc[:,x] = compass_df_TargetFundingPivot2.iloc[:,x].astype(float)

    # Compute ratio of targets
compass_df_TargetFundingPivot2["Ratio"] = compass_df_TargetFundingPivot2["BothINDORG"] / 2 + compass_df_TargetFundingPivot2["TargetOrganisations"] 

    # Select the countries to analyse
compass_df_TargetFundingPivot2_filtered = compass_df_TargetFundingPivot2.loc[compass_df_TargetFundingPivot2["CountryCode"].isin(list(my_countries["CountryCode"]))]


# amend the label to add the extended labels:
compass_df_TargetFundingPivot2_filtered = pd.merge(compass_df_TargetFundingPivot2_filtered,
                      compass_df_TargetFundingPivot_Label,
                      on ='CountryLabel', 
                      how ='inner')
    # Get rid of intermediate totals (not needed any more)
compass_df_TargetFundingPivot2_filtered['CountryLabel'] = compass_df_TargetFundingPivot2_filtered['CountryLabel2'] 
compass_df_TargetFundingPivot2_filtered = compass_df_TargetFundingPivot2_filtered.drop(['CountryLabel2'], axis=1).copy()
compass_df_TargetFundingPivot2_filtered

# Generate the graph:

import plotly.graph_objects as go

country = compass_df_TargetFundingPivot2_filtered['CountryLabel']

Ratio = compass_df_TargetFundingPivot2_filtered['Ratio']

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=Ratio,
    y=country,
    name='Researchers, students and teachers',
    marker=dict(
        size=13,
        cmax=1,
        cmin=0,
        color=compass_df_TargetFundingPivot2_filtered['isbudgeted'],
        colorscale='YlOrRd',
        showscale=True,
        colorbar=dict(
            title="Initiative funding data:",
            titleside="top",
            tickmode="array",
            tickvals=[0.1, 0.5, 0.9],
            ticktext=["None reported", "Reported for half <br>of initiaitves", "All reported"],
            tickfont=dict(family="verdana", color='#2a3f5f', size=10),
            title_font_family="verdana",
            title_font_size=13,
            title_font_color='#2a3f5f',
        ),
        line_color='rgba(156, 165, 196, 1.0)',
    )
))

fig.update_traces(mode='markers', marker=dict(line_width=1, symbol='circle', size=13))

fig.update_layout(
    title={
        'text': "Figure 4: Target groups in PRS initiatives <br> as a share of total initiatives within country (budget-weighted)",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis=dict(
        showgrid=False,
        showline=True,
        linecolor='rgb(102, 102, 102)',
        tickfont_color='rgb(102, 102, 102)',
        showticklabels=True,
        dtick=0.5,
        tickformat=".0%",
        ticks='outside',
        tickcolor='rgb(102, 102, 102)',
    ),  
    margin=dict(l=140, r=40, b=50, t=100),
    legend=dict(
        font_size=10,
        yanchor='middle',
        xanchor='left',
    ),
    width=800,
    height=300,
    paper_bgcolor='white',
    plot_bgcolor='white',
    hovermode='closest',
)

fig.update_yaxes(
    tickfont=dict(family="verdana", color='#2a3f5f', size=13)
)

fig.update_xaxes(
    title_text='Funding targets',
    title_font_family="verdana",
    title_font_size=17,
    title_font_color='#2a3f5f',
    fixedrange=True,
    range=(-0.05, 1.1),
    ticktext=["Only researchers, students <br> & teachers targeted", "Both equally <br> targeted", "Only research & education <br> institutions targeted"],
    tickvals=[0.01, 0.5, 1],
    showgrid=True,
    gridcolor="#444",
#    gridfont=dict(family="Open Sans"),
    ticks="outside",
    tickson="boundaries",
    tickcolor="#444",
    tickfont=dict(family="verdana", color='#2a3f5f', size=13)
)
 
fig.show()

The OECD median leans towards targeting public research organisations over funding of individual researchers. Among the five selected countries, initiatives in Canada and Korea appear to target individual researchers more often. Germany, France and Japan lean slightly more towards targeting organisations compared to the OECD median. Funding details are reported for most initiatives within these countries, although, as mentioned in Section 1, 21% of initiatives in Korea are missing budget data.

4. Focusing on research funding instruments

This section focuses on specific instruments relevant for public research:

  • Institutional funding for public research
  • Project grants for public research
  • Dedicated support for research infrastructures

While Section 3 focused on initiatives reported under the PRS policy area, this section looks at the full STIP Compass database. This is because the above instruments are reported in other areas of the survey (e.g. Science-industry knowledge transfer and sharing). Table 3 indicates the number of instruments reported by country and instrument type. More details on the availability of funding data are provided in the following sections.

Show Code
# Data preparation:

# Copy and filter the central repository:
compass_dfTreemap = compass_df[["OECD_STATUS", "InitiativeID", "CountryLabel", "CountryCode", "InstrumentTypeLabel", "InstrumentID"]].copy()

# Generate a table summing 'InstrumentID' numbers by instrment type
compass_df_pivot = pd.pivot_table(data=compass_dfTreemap, index=['OECD_STATUS', 'CountryCode', 'CountryLabel','InstrumentTypeLabel'], values=['InstrumentID'], aggfunc={'InstrumentID': pd.Series.nunique}, margins=True, margins_name='total', fill_value=0).reset_index()
compass_df_pivot = compass_df_pivot[(compass_df_pivot.InstrumentTypeLabel == "Institutional funding for public research") |(compass_df_pivot.InstrumentTypeLabel == "Project grants for public research") |(compass_df_pivot.InstrumentTypeLabel == "Dedicated support to research infrastructures")]
compass_df_pivot['InstrumentID'] = compass_df_pivot['InstrumentID'].astype(int)

compass_df_pivot=pd.pivot_table(compass_df_pivot, index=['CountryCode','OECD_STATUS','CountryLabel'], columns='InstrumentTypeLabel', values='InstrumentID').reset_index()

# Compute median in an extra row and append the row to dataframe
compass_df_pivot_OECD = compass_df_pivot[(compass_df_pivot.OECD_STATUS == "Member")]
new_row = {'OECD_STATUS':'Member', 'CountryLabel':'OECD Mean', 'CountryCode':'OECD2', 'Dedicated support to research infrastructures': round(compass_df_pivot_OECD['Dedicated support to research infrastructures'].mean(),2), 'Institutional funding for public research': round(compass_df_pivot_OECD['Institutional funding for public research'].mean(),2), 'Project grants for public research':round(compass_df_pivot_OECD['Project grants for public research'].mean(),2)}
compass_df_pivot = compass_df_pivot.append(new_row, ignore_index=True)

# filter for selected countries
compass_df_pivot = compass_df_pivot.loc[compass_df_pivot["CountryCode"].isin(list(my_countries["CountryCode"]))]
compass_df_pivot.reset_index()
compass_df_pivot = compass_df_pivot.drop(['CountryCode', 'OECD_STATUS'], axis=1).copy()


# reorder and rename the columns
compass_df_pivot = compass_df_pivot.reindex(columns=['CountryLabel', 'Institutional funding for public research', 'Project grants for public research', 'Dedicated support to research infrastructures'])
compass_df_pivot = compass_df_pivot.rename(columns={"Dedicated support to research infrastructures":"Dedicated support to <br>research infrastructures", "Institutional funding for public research":"Institutional funding <br>for public research", "Project grants for public research":"Project grants for <br>public research"})


# Generate a table
import plotly.figure_factory as ff

colorscale = [[0, '#272D31'],[.5, '#ffffff'],[1, '#ffffff']]

fig =  ff.create_table(compass_df_pivot, colorscale=colorscale)
fig.layout.width=850
fig.update_layout(title_text="Table 3: Number of instruments reported by country and selected instrument type",
            title_font_family="verdana",
            title_font_color='#2a3f5f',                            
            title_font_size=15)
fig.update_layout({'margin':{'t':40}})
fig.show()

4.1. Institutional funding for public research

The first series of graph looks at institutional funding for public research, including:

  • Non-competitive grants funding public research organisations according to various criteria (e.g. research capacity and performance indicators) to fulfil their research missions.
  • Block funding provides public research organisations with stable resources and a certain degree of autonomy in their research activities.

4.1.1. Funding horizons

Countries allocate institutional funding for a number of years. Figure 5 looks at the prevalence of different durations, i.e. 3 years or less, 4-6 years and 7 years or more. For each country, the figure indicates the weighted distribution of initiatives across three distinct budgeted horizons. In some instances, the time horizon is not specified in the instrument survey: these cases appear as "Not specified" in the figure. For each time horizon, the size of the corresponding bubble is proportional to the weighted number of instruments within each country, as a share of total weighted instruments reported in the country. As in the previous figure, the colour shading indicates the extent to which the instruments have associated budget information. The number of institutional funding instruments reported are also indicated next to each country label.

Show Code
# Data preparation:

    # Copy the central dataframe
compass_df_Instrument = compass_df.copy()

    # Filter on initiatives linked to "Institutional funding for public research":
compass_df_Instrument = compass_df_Instrument[(compass_df_Instrument.InstrumentTypeLabel == "Institutional funding for public research")]

    # Generate a table of initiatives by time-period for each country
compass_df_Instrument = pd.pivot_table(data=compass_df_Instrument, index=['CountryLabel', 'OECD_STATUS', 'CountryCode', 'InstrumentID', 'NBW', 'InstrumentTypeLabel'], values=['isbudgeted', 'F41:V28','F41:V29','F41:V30'], aggfunc={'isbudgeted': pd.Series.mean, 'F41:V28': sum, 'F41:V29': sum, 'F41:V30': sum}, fill_value=0).reset_index()

    # Add an extra column to account for missing timeline:
compass_df_Instrument["F41:V31"] = 0
for x in range(6, compass_df_Instrument.shape[1]-1):
    compass_df_Instrument["F41:V31"] = compass_df_Instrument["F41:V31"] + compass_df_Instrument.iloc[:,x]

for x in range(0, compass_df_Instrument.shape[0]-1):
    if compass_df_Instrument["F41:V31"].loc[x] != 0:
        compass_df_Instrument["F41:V31"].loc[x]  = 0
    else:
        compass_df_Instrument["F41:V31"].loc[x]  = 1
        # Note: we've added "F41:V31" to the dictionary earlier

    # Reorder the columns to have variables on the right:
compass_df_Instrument = compass_df_Instrument.reindex(columns=['OECD_STATUS', 'CountryLabel', 'CountryCode', 'InstrumentID', 'NBW', 'isbudgeted', 'F41:V28', 'F41:V29', 'F41:V30', 'F41:V31'])

compass_df_Instrument['isbudgeted28'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F41:V28']
compass_df_Instrument['isbudgeted29'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F41:V29']
compass_df_Instrument['isbudgeted30'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F41:V30']
compass_df_Instrument['isbudgeted31'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F41:V31']
compass_df_Instrument['total28'] = compass_df_Instrument['F41:V28']
compass_df_Instrument['total29'] = compass_df_Instrument['F41:V29']
compass_df_Instrument['total30'] = compass_df_Instrument['F41:V30']
compass_df_Instrument['total31'] = compass_df_Instrument['F41:V31']

# Weight each data by NBW
for x in range(6, compass_df_Instrument.shape[1]-8):
    compass_df_Instrument.iloc[:,x] = compass_df_Instrument.iloc[:,x] * compass_df_Instrument['NBW']

# Aggregate the data by countries
compass_df_Instrument = pd.pivot_table(data=compass_df_Instrument, index=['CountryLabel', 'OECD_STATUS', 'CountryCode'], values=['InstrumentID', 'isbudgeted', 'isbudgeted28', 'isbudgeted29', 'isbudgeted30', 'isbudgeted31', 'total28', 'total29', 'total30', 'total31', 'F41:V28','F41:V29','F41:V30', 'F41:V31'], aggfunc={'InstrumentID':pd.Series.nunique, 'isbudgeted': pd.Series.mean, 'isbudgeted28': sum, 'isbudgeted29': sum, 'isbudgeted30': sum, 'isbudgeted31': sum, 'total28': sum, 'total29': sum, 'total30': sum, 'total31': sum, 'F41:V28': sum, 'F41:V29': sum, 'F41:V30': sum, 'F41:V31': sum}, margins=True, margins_name='total2', fill_value=0).reset_index()
index_names = compass_df_Instrument[(compass_df_Instrument['OECD_STATUS'] == '')].index
compass_df_Instrument.drop(index_names, inplace = True)
       
compass_df_Instrument['isbudgeted28'] = compass_df_Instrument['isbudgeted28'] / compass_df_Instrument['total28'] 
compass_df_Instrument['isbudgeted29'] = compass_df_Instrument['isbudgeted29'] / compass_df_Instrument['total29'] 
compass_df_Instrument['isbudgeted30'] = compass_df_Instrument['isbudgeted30'] / compass_df_Instrument['total30'] 
compass_df_Instrument['isbudgeted31'] = compass_df_Instrument['isbudgeted31'] / compass_df_Instrument['total31'] 

# Get rid of intermediate totals (not needed any more)
compass_df_Instrument = compass_df_Instrument.drop(['total28', 'total29', 'total30', 'total31'], axis=1).copy()

# Generate a dataframe filtered on OECD member only:
compass_df_Instrument_bis = compass_df_Instrument[compass_df_Instrument.OECD_STATUS == "Member"]
new_row = {'CountryLabel':'OECD Mean', 'OECD_STATUS': 'Member', 'CountryCode':'OECD2', 'F41:V28': round(compass_df_Instrument_bis['F41:V28'].mean(),1), 'F41:V29': round(compass_df_Instrument_bis['F41:V29'].mean(),1), 'F41:V30': round(compass_df_Instrument_bis['F41:V30'].mean(),1), 'F41:V31': round(compass_df_Instrument_bis['F41:V31'].mean(),1), 'isbudgeted': compass_df_Instrument_bis['isbudgeted'].mean(), 'isbudgeted28': compass_df_Instrument_bis['isbudgeted28'].mean(), 'isbudgeted29': compass_df_Instrument_bis['isbudgeted29'].mean(), 'isbudgeted30': compass_df_Instrument_bis['isbudgeted30'].mean(), 'isbudgeted31': compass_df_Instrument_bis['isbudgeted31'].mean(), 'InstrumentID': compass_df_Instrument_bis['InstrumentID'].mean()}
new_row2 = {'CountryLabel':'OECD Median', 'OECD_STATUS': 'Member', 'CountryCode':'OECD', 'F41:V28': round(compass_df_Instrument_bis['F41:V28'].median(),1), 'F41:V29': round(compass_df_Instrument_bis['F41:V29'].median(),1), 'F41:V30': round(compass_df_Instrument_bis['F41:V30'].median(),1), 'F41:V31': round(compass_df_Instrument_bis['F41:V31'].median(),1), 'isbudgeted': compass_df_Instrument_bis['isbudgeted'].median(), 'isbudgeted28': compass_df_Instrument_bis['isbudgeted28'].median(), 'isbudgeted29': compass_df_Instrument_bis['isbudgeted29'].median(), 'isbudgeted30': compass_df_Instrument_bis['isbudgeted30'].median(), 'isbudgeted31': compass_df_Instrument_bis['isbudgeted31'].median(), 'InstrumentID': compass_df_Instrument_bis['InstrumentID'].median()}

# Append the OECD data to the dataframes
compass_df_Instrument = compass_df_Instrument.append(new_row, ignore_index=True)


# Generate new labels dataframe
compass_df_Instrument_Label = compass_df_Instrument[['CountryLabel', 'isbudgeted', 'InstrumentID']].copy()
compass_df_Instrument_Label['CountryLabel2'] = compass_df_Instrument_Label['CountryLabel']


compass_df_Instrument_Label['isbudgeted'] = compass_df_Instrument_Label['InstrumentID'] * compass_df_Instrument_Label['isbudgeted']

for x in range(0, compass_df_Instrument_Label.shape[0]):
    if compass_df_Instrument_Label["CountryLabel"].loc[x] == "OECD Mean":
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " (" + (round(compass_df_Instrument_Label['InstrumentID'].loc[x],1).astype(float)).astype(str) + ") "
    else:
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " (" + (compass_df_Instrument_Label['isbudgeted'].loc[x].astype(int)).astype(str) + " / " + (compass_df_Instrument_Label['InstrumentID'].loc[x].astype(int)).astype(str) + " ) "
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " ("  + (compass_df_Instrument_Label['InstrumentID'].loc[x].astype(int)).astype(str) + ") "

# Prepare the dataframe for graph inclusion:
# Reorder the columns to have variables on the right:
compass_df_Instrument = compass_df_Instrument.reindex(columns=['OECD_STATUS', 'CountryLabel', 'CountryCode', 'InstrumentID', 'isbudgeted', 'isbudgeted28', 'isbudgeted29', 'isbudgeted30', 'isbudgeted31','F41:V28', 'F41:V29', 'F41:V30', 'F41:V31', 'F41:V32'])

# The dataframe obtained shows counts of initiatives for each time horizon - Compute the proportion instead
compass_df_Instrument["F41:V32"] = 0
for x in range(9, compass_df_Instrument.shape[1]-1):
    compass_df_Instrument["F41:V32"] = compass_df_Instrument["F41:V32"] + compass_df_Instrument.iloc[:,x]
for x in range(9, compass_df_Instrument.shape[1]-1):
    compass_df_Instrument.iloc[:,x] = compass_df_Instrument.iloc[:,x] / compass_df_Instrument["F41:V32"] #compass_df_Instrument.iloc[:,compass_df_Instrument.shape[1]]
    

    # Select the countries to analyse
compass_df_Instrument2 = compass_df_Instrument.loc[compass_df_Instrument["CountryCode"].isin(list(my_countries["CountryCode"]))]

    # Trim the dataframe to only core data and transpose it
compass_df_Instrument2 = compass_df_Instrument2[['CountryLabel', 'InstrumentID', 'isbudgeted', 'isbudgeted28', 'isbudgeted29', 'isbudgeted30', 'isbudgeted31','F41:V28', 'F41:V29', 'F41:V30', 'F41:V31']].copy()

    # rename the InstrumentID to total
compass_df_Instrument2 = compass_df_Instrument2.rename(columns={"InstrumentID":"Total"})

    # Retrieve headers frim Dict_Headers
d = Dict_Headers.set_index('Code').to_dict()
compass_df_Instrument2.columns = compass_df_Instrument2.columns.to_series().map(d['Long Description'])
compass_df_Instrument2

    # melt the dataframe (twice, one for the NBW by timelines, one for the initiatives funded (i.e the color)
compass_df_Instrument3 = compass_df_Instrument2.copy()
new_header3 = list(compass_df_Instrument3.columns[7:].values)
compass_df_Instrument3 = compass_df_Instrument3.melt(id_vars=["CountryLabel", "isbudgeted"], value_vars=new_header3, var_name='Variable', value_name='Value')
compass_df_Instrument3 = compass_df_Instrument3.sort_values(["Variable", "CountryLabel", "isbudgeted"]).copy()
compass_df_Instrument4 = compass_df_Instrument2.copy()
new_header4 = list(compass_df_Instrument4.columns[3:7].values)
compass_df_Instrument4 = compass_df_Instrument2.melt(id_vars=["CountryLabel", "Total"], value_vars=new_header4, var_name='Variable', value_name='Color')
compass_df_Instrument4 = compass_df_Instrument4.sort_values(["Color", "CountryLabel"]).copy()
compass_df_Instrument3["Color"] = compass_df_Instrument4["Color"]
compass_df_Instrument2 = compass_df_Instrument3.copy()


# amend the label to add the extended labels:
compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      compass_df_Instrument_Label, 
                      on ='CountryLabel', 
                      how ='inner')
    # Get rid of intermediate totals (not needed any more)
compass_df_Instrument2['CountryLabel'] = compass_df_Instrument2['CountryLabel2'] 
compass_df_Instrument2 = compass_df_Instrument2.drop(['InstrumentID', 'isbudgeted_x', 'isbudgeted_y', 'CountryLabel2'], axis=1).copy()
compass_df_Instrument2['Value'] = compass_df_Instrument2['Value'].fillna(0)
compass_df_Instrument2['Color'] = compass_df_Instrument2['Color'].fillna(0)


    # assign numerical values to x-axis values
compass_df_Instrument2["axislocation"] = 1

for x in range(0, compass_df_Instrument2.shape[0]-1):
        if compass_df_Instrument2["Variable"].loc[x] == "3 years or less":
            compass_df_Instrument2["axislocation"].loc[x]  = 0.25
        elif compass_df_Instrument2["Variable"].loc[x] == "4-6 years":
            compass_df_Instrument2["axislocation"].loc[x]  = 0.5
        elif compass_df_Instrument2["Variable"].loc[x] == "7 years or more":
            compass_df_Instrument2["axislocation"].loc[x]  = 0.75
        else:
            compass_df_Instrument2["axislocation"].loc[x]  = 1
                                  
# Generate the graph:

import plotly.graph_objects as go

Value = compass_df_Instrument2["Value"]

fig = go.Figure(data=[go.Scatter(
    x= compass_df_Instrument2["axislocation"],
    y= compass_df_Instrument2["CountryLabel"],
    text=compass_df_Instrument2["Value"].map(lambda n: '{:,.2%}'.format(n)),
    texttemplate = "{0:.2f}%",
    mode='markers',
    marker=dict(
        size=Value,
        line=dict(width=2,color='DarkSlateGrey'),
        sizemode='area',
        sizeref=2.*max(Value)/(40.**2),
        sizemin=0,
        cmax=1,
        cmin=0,
        color=compass_df_Instrument2["Color"], 
        colorscale='YlOrRd',
        showscale=True,
        colorbar=dict(
            title="Instrument funding data:",
            titleside="top",
            tickmode="array",
            tickvals=[0.1, 0.5, 0.9],
            ticktext=["None reported", "Reported for half <br>of instruments", "All reported"],
            tickfont=dict(family="verdana", color='#2a3f5f', size=10),
            title_font_family="verdana",
            title_font_size=13,
            title_font_color='#2a3f5f')
    )
)])

fig.update_layout(
    title={
        'text': "Figure 5: Budget horizons in institutional funding for public research <br> as a share of total instruments reported (budget-weighted)",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis=dict(
        showgrid=False,
        showline=True,
        linecolor='rgb(102, 102, 102)',
        tickfont_color='rgb(102, 102, 102)',
        showticklabels=True,
        dtick=0.5,
        tickformat=".0%",
        ticks='outside',
        tickcolor='rgb(102, 102, 102)',
    ),
    margin=dict(l=140, r=40, b=50, t=100),
    legend=dict(
        font_size=10,
        yanchor='middle',
        xanchor='left',
    ),
    width=800,
    height=450,
    paper_bgcolor='white',
    plot_bgcolor='white',
    hovermode='closest',
)

fig.update_yaxes(
    tickfont=dict(family="verdana", color='#2a3f5f', size=13),
    ticks="inside",
    tickson="boundaries",
    tickcolor="#444",
    gridwidth=1,
    gridcolor="#444",
    showgrid=True
)

fig.update_xaxes(
    title_text='Funding horizon',
    title_font_family="verdana",
    title_font_size=17,
    title_font_color='#2a3f5f',
    fixedrange=True,
    range=(0.1, 1.1),
    ticktext=['3 years or less', '4-6 years', '7 years or more', 'Not specified'],
    tickvals=[0.25, 0.5, 0.75, 1],
    showgrid=False,
    gridwidth=1,
    gridcolor="#444",
    ticks="outside",
    tickson="boundaries",
    tickcolor="#444",
    tickfont=dict(family="verdana", color='#2a3f5f', size=13)
)

fig.show()        
        

On average, most of the OECD’s institutional funding for public research is short term (3 years or less) or mid-term (4-6 years). While Canada follows OECD’s pattern, the other four countries have a longer-term focus in comparison. Japan, Germany, and Korea have a larger bulk of mid-term and long-term initiatives (4-6 years and >7 years). France allocates more budget to a mix of long-term (>7 years) and short-term (3 years or less) initiatives.

4.1.2. Performance-based element to the allocation

Allocation of institutional funding for public research can be done on the basis of performance-based contracts or criteria. Figure 6 looks at the prevalence of performance-based elements to the allocation of funding. For each country, a value is assigned for each initiative: 1 when a performance-based element is reported, and 0 otherwise. For each country, an aggregate ratio is then computed by aggregating these values and the associated initiatives’ budget weight coefficients, to summarise country prevalence of performance-based elements in the allocation of institutional funding.

Show Code
# Data preparation:

    # Copy the central dataframe
compass_df_Instrument2 = compass_df.copy()

    # Filter on initiatives linked to "Institutional funding for public research"
compass_df_Instrument2 = compass_df_Instrument2[(compass_df_Instrument2.InstrumentTypeLabel == "Institutional funding for public research")]

    # Generate a table of initiatives
compass_df_Instrument2 = compass_df_Instrument2[['InstrumentID', 'OECD_STATUS', 'CountryLabel', 'InstrumentTypeLabel', 'CountryCode', 'NBW', 'F38:V4', 'F38:V5', 'isbudgeted']].copy()

    # Weight initiatives by NBW
compass_df_Instrument2['F38:V4'] = compass_df_Instrument2['F38:V4'] * compass_df_Instrument2['NBW']
compass_df_Instrument2['F38:V5'] = compass_df_Instrument2['F38:V5'] * compass_df_Instrument2['NBW']

    # Generate a table of initiatives for each country
compass_df_Instrument2 = pd.pivot_table(data=compass_df_Instrument2, index=['CountryLabel', 'OECD_STATUS', 'CountryCode', 'InstrumentTypeLabel'], values=['isbudgeted','InstrumentID', 'F38:V4','F38:V5'], aggfunc={'isbudgeted': pd.Series.mean, 'InstrumentID': pd.Series.nunique, 'F38:V4': sum, 'F38:V5': sum}, fill_value=0).reset_index()


# Introduce a variable 'Ratio' that reflects the propotion of performance-based initiatives to total initiatives
compass_df_Instrument2['Ratio'] = compass_df_Instrument2['F38:V4'] / (compass_df_Instrument2['F38:V4'] + compass_df_Instrument2['F38:V5'])
compass_df_Instrument2['Ratio'] = compass_df_Instrument2['Ratio'].astype(float)


# Complement with OECD data:

    # Generate a dataframe filtered on OECD member only:
compass_df_Instrument2_bis = compass_df_Instrument2[compass_df_Instrument2.OECD_STATUS == "Member"]
new_row = {'CountryLabel':'OECD Mean', 'OECD_STATUS': 'Member', 'CountryCode':'OECD2', 'InstrumentTypeLabel': 'Institutional funding for public research', 'F38:V4': round(compass_df_Instrument2_bis['F38:V4'].mean(),1), 'F38:V5': round(compass_df_Instrument2_bis['F38:V5'].mean(),1), 'Ratio': round((compass_df_Instrument2_bis['F38:V4'].mean()/(compass_df_Instrument2_bis['F38:V4'].mean()+compass_df_Instrument2_bis['F38:V5'].mean())),4), 'InstrumentID': round(compass_df_Instrument2_bis['InstrumentID'].mean(),1), 'isbudgeted': round(compass_df_Instrument2_bis['isbudgeted'].mean(),1)}
new_row2 = {'CountryLabel':'OECD Median', 'OECD_STATUS': 'Member', 'CountryCode':'OECD', 'InstrumentTypeLabel': 'Institutional funding for public research', 'F38:V4': round(compass_df_Instrument2_bis['F38:V4'].median(),1), 'F38:V5': round(compass_df_Instrument2_bis['F38:V5'].median(),1), 'Ratio': round((compass_df_Instrument2_bis['F38:V4'].median()/(compass_df_Instrument2_bis['F38:V4'].median()+compass_df_Instrument2_bis['F38:V5'].median())),4), 'InstrumentID': round(compass_df_Instrument2_bis['InstrumentID'].median(),1), 'isbudgeted': round(compass_df_Instrument2_bis['isbudgeted'].median(),1)}

    # Append the OECD data to the dataframes
compass_df_Instrument2 = compass_df_Instrument2.append(new_row, ignore_index=True)



# Generate new labels dataframe
compass_df_Instrument_Label = compass_df_Instrument2[['CountryLabel', 'isbudgeted', 'InstrumentID']].copy()
compass_df_Instrument_Label['CountryLabel2'] = compass_df_Instrument_Label['CountryLabel']
compass_df_Instrument_Label['isbudgeted'] = compass_df_Instrument_Label['InstrumentID'] * compass_df_Instrument_Label['isbudgeted']


for x in range(0, compass_df_Instrument_Label.shape[0]):
    if compass_df_Instrument_Label["CountryLabel"].loc[x] == "OECD Mean":
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " (" + (round(compass_df_Instrument_Label['InstrumentID'].loc[x],1).astype(float)).astype(str) + ")"
    else:
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " (" + (compass_df_Instrument_Label['InstrumentID'].loc[x].astype(int)).astype(str) + ")"
                
# Prepare the dataframe for graph inclusion:
    # Due to lack of data, some countries must be removed: remove COUNTRY < 3 initiatives:
index_names = compass_df_Instrument2[(compass_df_Instrument2['F38:V4'] + compass_df_Instrument2['F38:V5']) < 3 ].index
compass_df_Instrument2 = compass_df_Instrument2.drop(index_names, inplace = False)

    # Sort the dataframe in ascending Ratio:
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='Ratio', ascending=False)


    # Select the countries to analyse
compass_df_Instrument2 = compass_df_Instrument2.loc[compass_df_Instrument2["CountryCode"].isin(list(my_countries["CountryCode"]))]


    # update the label to used amended labels

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                        compass_df_Instrument_Label, 
                        on ='CountryLabel', 
                        how ='inner')
    # Get rid of intermediate totals (not needed any more)
compass_df_Instrument2['CountryLabel'] = compass_df_Instrument2['CountryLabel2'] 
compass_df_Instrument2 = compass_df_Instrument2.drop(['InstrumentID_x', 'InstrumentID_y', 'isbudgeted_x', 'isbudgeted_y', 'CountryLabel2'], axis=1).copy()



# Generate the graph:

import plotly.graph_objects as go

country = compass_df_Instrument2['CountryLabel']

Ratio = compass_df_Instrument2['Ratio']

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=Ratio,
    y=country,
    name='Performance based elelement to the allocation',
    marker=dict(
        size=13,
        cmax=1,
        cmin=0,
        color=compass_df_TargetFundingPivot2_filtered['isbudgeted'],
        colorscale='YlOrRd',
        showscale=True,
        colorbar=dict(
            title="Instrument funding data:",
            titleside="top",
            tickmode="array",
            tickvals=[0.1, 0.5, 0.9],
            ticktext=["None reported", "Reported for half <br>of instruments", "All reported"],
            tickfont=dict(family="verdana", color='#2a3f5f', size=10),
            title_font_family="verdana",
            title_font_size=13,
            title_font_color='#2a3f5f',
        ),
        line_color='rgba(156, 165, 196, 1.0)',
    )
))

fig.update_traces(mode='markers', marker=dict(line_width=1, symbol='circle', size=13))

fig.update_layout(
    title={
        'text': "Figure 6: 'Institutional funding for public research' instruments <br>with a performance-based allocation element <br> as a share of total instruments within country (budget-weighted)",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis=dict(
        showgrid=False,
        showline=True,
        linecolor='rgb(102, 102, 102)',
        tickfont_color='rgb(102, 102, 102)',
        showticklabels=True,
        dtick=0.5,
        tickformat=".0%",
        ticks='outside',
        tickcolor='rgb(102, 102, 102)',
    ),
    margin=dict(l=140, r=40, b=50, t=100),
    legend=dict(
        font_size=10,
        yanchor='middle',
        xanchor='left',
    ),
    width=800,
    height=300,
    paper_bgcolor='white',
    plot_bgcolor='white',
    hovermode='closest',
)
fig.update_yaxes(
    tickfont=dict(family="verdana", color='#2a3f5f', size=13)
)
fig.update_xaxes(
    title_text='Performance-based allocation',
    title_font_family="verdana",
    title_font_size=17,
    title_font_color='#2a3f5f',
    fixedrange=True,
    range=(-0.05, 1.1),
    ticktext=["Never", "Half the time", "Always"],
    tickvals=[0.01, 0.5, 1],
    showgrid=True,
    gridwidth=1,
    gridcolor="#444",
    ticks="outside",
    tickson="boundaries",
    tickcolor="#444",
    tickfont=dict(family="verdana", color='#2a3f5f', size=13)
)

fig.show()

On average, two thirds of OECD area’s weighted institutional funding instruments includes performance criteria in the allocation process. Great variance can be observed across the five selected countries. While Canada allocated funding based on performance criteria for half of the weighted instruments, Japan and Korea use more systematically performance-based criteria. Germany and France gravitate more closely around the OECD mean.

4.1.3. Allocation criteria

Figure 7 focuses on the various types of criteria used to allocate institutional funding for public research. The STIP Compass dataset considers different types of criteria but, as we will see, some are more intensively cited compared to others. The figure aggregates the weighted frequency of allocation criteria by country. The criteria are ranked by degree of importance among the selected countries and are displayed by decreasing order of prevalence. For each country, the bubble sizes are proportional to the criteria's prevalence in the overall pool of weighted instruments.

Show Code
# Data preparation:
    # Copy the central dataframe
compass_df_Instrument2 = compass_df.copy()

    # Filter on initiatives linked to "Institutional funding for public research"
compass_df_Instrument2 = compass_df_Instrument2[(compass_df_Instrument2.InstrumentTypeLabel == "Institutional funding for public research")].copy()

    # Generate a table of initiatives
compass_df_Instrument2 = compass_df_Instrument2[['InstrumentID', 'OECD_STATUS', 'CountryLabel', 'CountryCode', 'isbudgeted', 'NBW', 'F40:V7', 'F40:V8', 'F40:V9', 'F40:V10', 'F40:V11', 'F40:V12', 'F40:V14', 'F40:V13', 'F40:V15', 'F40:V16', 'F40:V17', 'F40:V18', 'F40:V668', 'F40:V800', 'F40:V801', 'F40:V669']].copy()

    # inroduce 2 sets of variables, "isbudgetedXX" and "totalXX"

mapping_labels={'BUDGET':['isbudgeted7','isbudgeted8','isbudgeted9','isbudgeted10','isbudgeted11','isbudgeted12','isbudgeted13','isbudgeted14','isbudgeted15','isbudgeted16','isbudgeted17','isbudgeted18','isbudgeted668','isbudgeted669', 'isbudgeted800', 'isbudgeted801'], 
                'TOTAL':['total7', 'total8', 'total9', 'total10', 'total11', 'total12', 'total13', 'total14', 'total15', 'total16', 'total17', 'total18', 'total668', 'total669', 'total800', 'total801'],
                'VARIABLE':['F40:V7', 'F40:V8', 'F40:V9', 'F40:V10', 'F40:V11', 'F40:V12', 'F40:V13', 'F40:V14', 'F40:V15', 'F40:V16', 'F40:V17', 'F40:V18', 'F40:V668', 'F40:V669', 'F40:V800', 'F40:V801']}

mapping_labels=pd.DataFrame(mapping_labels)

    
compass_df_Instrument2['isbudgeted7'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V7']
compass_df_Instrument2['isbudgeted8'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V8']
compass_df_Instrument2['isbudgeted9'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V9']
compass_df_Instrument2['isbudgeted10'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V10']
compass_df_Instrument2['isbudgeted11'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V11']
compass_df_Instrument2['isbudgeted12'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V12']
compass_df_Instrument2['isbudgeted13'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V13']
compass_df_Instrument2['isbudgeted14'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V14']
compass_df_Instrument2['isbudgeted15'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V15']
compass_df_Instrument2['isbudgeted16'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V16']
compass_df_Instrument2['isbudgeted17'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V17']
compass_df_Instrument2['isbudgeted18'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V18']
compass_df_Instrument2['isbudgeted668'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V668']
compass_df_Instrument2['isbudgeted800'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V800']
compass_df_Instrument2['isbudgeted801'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V801']
compass_df_Instrument2['isbudgeted669'] = compass_df_Instrument2['isbudgeted'] * compass_df_Instrument2['F40:V669']

compass_df_Instrument2['total7'] = compass_df_Instrument2['F40:V7']
compass_df_Instrument2['total8'] = compass_df_Instrument2['F40:V8']
compass_df_Instrument2['total9'] = compass_df_Instrument2['F40:V9']
compass_df_Instrument2['total10'] = compass_df_Instrument2['F40:V10']
compass_df_Instrument2['total11'] = compass_df_Instrument2['F40:V11']
compass_df_Instrument2['total12'] = compass_df_Instrument2['F40:V12']
compass_df_Instrument2['total13'] = compass_df_Instrument2['F40:V13']
compass_df_Instrument2['total14'] = compass_df_Instrument2['F40:V14']
compass_df_Instrument2['total15'] = compass_df_Instrument2['F40:V15']
compass_df_Instrument2['total16'] = compass_df_Instrument2['F40:V16']
compass_df_Instrument2['total17'] = compass_df_Instrument2['F40:V17']
compass_df_Instrument2['total18'] = compass_df_Instrument2['F40:V18']
compass_df_Instrument2['total668'] = compass_df_Instrument2['F40:V668']
compass_df_Instrument2['total800'] = compass_df_Instrument2['F40:V800']
compass_df_Instrument2['total801'] = compass_df_Instrument2['F40:V801']
compass_df_Instrument2['total669'] = compass_df_Instrument2['F40:V669']

    # Weight initiatives by NBW
for x in range(6, (compass_df_Instrument2.shape[1])-32):
    compass_df_Instrument2.iloc[:,x] = compass_df_Instrument2.iloc[:,x] * compass_df_Instrument2['NBW']

    # Introduce a variable 'Total' that reflects the proportion of performance-based initiatives to total initiatives
compass_df_Instrument2['Total'] = compass_df_Instrument2['F40:V7'] + compass_df_Instrument2['F40:V8'] + compass_df_Instrument2['F40:V9'] + compass_df_Instrument2['F40:V10'] + compass_df_Instrument2['F40:V11'] + compass_df_Instrument2['F40:V12'] + compass_df_Instrument2['F40:V14'] + compass_df_Instrument2['F40:V13'] + compass_df_Instrument2['F40:V15'] + compass_df_Instrument2['F40:V16'] + compass_df_Instrument2['F40:V17'] + compass_df_Instrument2['F40:V18'] + compass_df_Instrument2['F40:V668'] + compass_df_Instrument2['F40:V800'] + compass_df_Instrument2['F40:V801'] + compass_df_Instrument2['F40:V669']
compass_df_Instrument2 = compass_df_Instrument2.drop_duplicates()
compass_df_Instrument2['Unique'] = 1


    # Reorder the columns to have variables on the right:
compass_df_Instrument2_isbudgeted = compass_df_Instrument2.copy()

    # Aggregate data by countries in two distinct dataframe
compass_df_Instrument2 = pd.pivot_table(data=compass_df_Instrument2, index=['CountryLabel', 'OECD_STATUS', 'CountryCode'], values=['Total', 'isbudgeted', 'Unique', 'F40:V7','F40:V8','F40:V9','F40:V10','F40:V11','F40:V12','F40:V13','F40:V14','F40:V15','F40:V16','F40:V17','F40:V18','F40:V668','F40:V800','F40:V801','F40:V669'], aggfunc={'Total': sum, 'Unique': sum, 'isbudgeted': pd.Series.mean, 'F40:V7': sum,'F40:V8': sum,'F40:V9': sum,'F40:V10': sum,'F40:V11': sum,'F40:V12': sum,'F40:V13': sum,'F40:V14': sum,'F40:V15': sum,'F40:V16': sum,'F40:V17': sum,'F40:V18': sum,'F40:V668': sum,'F40:V800': sum,'F40:V801': sum,'F40:V669': sum}, fill_value=0).reset_index()
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='CountryLabel', ascending=True)

compass_df_Instrument2_isbudgeted = pd.pivot_table(data=compass_df_Instrument2_isbudgeted, index=['CountryLabel', 'OECD_STATUS', 'CountryCode'], values=['isbudgeted7', 'isbudgeted8', 'isbudgeted9', 'isbudgeted10', 'isbudgeted11', 'isbudgeted12', 'isbudgeted13', 'isbudgeted14', 'isbudgeted15', 'isbudgeted16', 'isbudgeted17', 'isbudgeted18', 'isbudgeted668', 'isbudgeted800', 'isbudgeted801', 'isbudgeted669', 'total7', 'total8', 'total9', 'total10', 'total11', 'total12', 'total13', 'total14', 'total15', 'total16', 'total17', 'total18', 'total668', 'total800', 'total801', 'total669'], aggfunc={'isbudgeted7': sum, 'isbudgeted8': sum, 'isbudgeted9': sum, 'isbudgeted10': sum, 'isbudgeted11': sum, 'isbudgeted12': sum, 'isbudgeted13': sum, 'isbudgeted14': sum, 'isbudgeted15': sum, 'isbudgeted16': sum, 'isbudgeted17': sum, 'isbudgeted18': sum, 'isbudgeted668': sum, 'isbudgeted800': sum, 'isbudgeted801': sum, 'isbudgeted669': sum, 'total7': sum, 'total8': sum, 'total9': sum, 'total10': sum, 'total11': sum, 'total12': sum, 'total13': sum, 'total14': sum, 'total15': sum, 'total16': sum, 'total17': sum, 'total18': sum, 'total668': sum, 'total800': sum, 'total801': sum, 'total669': sum}, fill_value=0).reset_index()
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.sort_values(by='CountryLabel', ascending=True)

compass_df_Instrument2_isbudgeted['isbudgeted7'] = compass_df_Instrument2_isbudgeted['isbudgeted7'] / compass_df_Instrument2_isbudgeted['total7'] 
compass_df_Instrument2_isbudgeted['isbudgeted8'] = compass_df_Instrument2_isbudgeted['isbudgeted8'] / compass_df_Instrument2_isbudgeted['total8'] 
compass_df_Instrument2_isbudgeted['isbudgeted9'] = compass_df_Instrument2_isbudgeted['isbudgeted9'] / compass_df_Instrument2_isbudgeted['total9'] 
compass_df_Instrument2_isbudgeted['isbudgeted10'] = compass_df_Instrument2_isbudgeted['isbudgeted10'] / compass_df_Instrument2_isbudgeted['total10'] 
compass_df_Instrument2_isbudgeted['isbudgeted11'] = compass_df_Instrument2_isbudgeted['isbudgeted11'] / compass_df_Instrument2_isbudgeted['total11'] 
compass_df_Instrument2_isbudgeted['isbudgeted12'] = compass_df_Instrument2_isbudgeted['isbudgeted12'] / compass_df_Instrument2_isbudgeted['total12'] 
compass_df_Instrument2_isbudgeted['isbudgeted13'] = compass_df_Instrument2_isbudgeted['isbudgeted13'] / compass_df_Instrument2_isbudgeted['total13'] 
compass_df_Instrument2_isbudgeted['isbudgeted14'] = compass_df_Instrument2_isbudgeted['isbudgeted14'] / compass_df_Instrument2_isbudgeted['total14'] 
compass_df_Instrument2_isbudgeted['isbudgeted15'] = compass_df_Instrument2_isbudgeted['isbudgeted15'] / compass_df_Instrument2_isbudgeted['total15'] 
compass_df_Instrument2_isbudgeted['isbudgeted16'] = compass_df_Instrument2_isbudgeted['isbudgeted16'] / compass_df_Instrument2_isbudgeted['total16'] 
compass_df_Instrument2_isbudgeted['isbudgeted17'] = compass_df_Instrument2_isbudgeted['isbudgeted17'] / compass_df_Instrument2_isbudgeted['total17'] 
compass_df_Instrument2_isbudgeted['isbudgeted18'] = compass_df_Instrument2_isbudgeted['isbudgeted18'] / compass_df_Instrument2_isbudgeted['total18'] 
compass_df_Instrument2_isbudgeted['isbudgeted668'] = compass_df_Instrument2_isbudgeted['isbudgeted668'] / compass_df_Instrument2_isbudgeted['total668'] 
compass_df_Instrument2_isbudgeted['isbudgeted669'] = compass_df_Instrument2_isbudgeted['isbudgeted669'] / compass_df_Instrument2_isbudgeted['total669'] 
compass_df_Instrument2_isbudgeted['isbudgeted800'] = compass_df_Instrument2_isbudgeted['isbudgeted800'] / compass_df_Instrument2_isbudgeted['total800'] 
compass_df_Instrument2_isbudgeted['isbudgeted801'] = compass_df_Instrument2_isbudgeted['isbudgeted801'] / compass_df_Instrument2_isbudgeted['total801'] 

# Get rid of intermediate totals (not needed any more)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['total7', 'total8', 'total9', 'total10', 'total11', 'total12', 'total13', 'total14', 'total15', 'total16', 'total17', 'total18', 'total668', 'total669', 'total800', 'total801'], axis=1).copy()
    

# Complement with OECD data:
    # Generate a dataframe filtered on OECD member only:
compass_df_Instrument2_bis = compass_df_Instrument2[compass_df_Instrument2.OECD_STATUS == "Member"]
compass_df_Instrument2_isbudgeted_bis = compass_df_Instrument2_isbudgeted[compass_df_Instrument2_isbudgeted.OECD_STATUS == "Member"]

    # Compute median (doesnt make much sense here)
new_row = {'CountryLabel':'OECD Median', 'OECD_STATUS': "Member", 'CountryCode':'OECD', 'Total': compass_df_Instrument2_bis['Total'].median(), 'Unique': compass_df_Instrument2_bis['Unique'].median(), 'F40:V7': compass_df_Instrument2_bis['F40:V7'].median(), 
           'F40:V8': compass_df_Instrument2_bis['F40:V8'].median(), 'F40:V9': compass_df_Instrument2_bis['F40:V9'].median(), 'F40:V10': compass_df_Instrument2_bis['F40:V10'].median(), 'F40:V11': compass_df_Instrument2_bis['F40:V11'].median(), 'F40:V12': compass_df_Instrument2_bis['F40:V12'].median(), 'F40:V13': compass_df_Instrument2_bis['F40:V13'].median(), 
           'F40:V14': compass_df_Instrument2_bis['F40:V14'].median(), 'F40:V15': compass_df_Instrument2_bis['F40:V15'].median(), 'F40:V16': compass_df_Instrument2_bis['F40:V16'].median(), 'F40:V17': compass_df_Instrument2_bis['F40:V17'].median(), 'F40:V18': compass_df_Instrument2_bis['F40:V18'].median(), 'F40:V668': compass_df_Instrument2_bis['F40:V668'].median(), 'F40:V800': compass_df_Instrument2_bis['F40:V800'].median(), 'F40:V801': compass_df_Instrument2_bis['F40:V801'].median(), 
           'F40:V669': compass_df_Instrument2_bis['F40:V669'].median(), 'isbudgeted': compass_df_Instrument2_bis['isbudgeted'].median()}

new_row_isbudgeted = {'CountryLabel':'OECD Median', 'OECD_STATUS': "Member", 'CountryCode':'OECD', 'isbudgeted7': compass_df_Instrument2_isbudgeted_bis['isbudgeted7'].median(), 'isbudgeted8': compass_df_Instrument2_isbudgeted_bis['isbudgeted8'].median(), 'isbudgeted9': compass_df_Instrument2_isbudgeted_bis['isbudgeted9'].median(), 
           'isbudgeted10': compass_df_Instrument2_isbudgeted_bis['isbudgeted10'].median(), 'isbudgeted11': compass_df_Instrument2_isbudgeted_bis['isbudgeted11'].median(), 'isbudgeted12': compass_df_Instrument2_isbudgeted_bis['isbudgeted12'].median(), 'isbudgeted13': compass_df_Instrument2_isbudgeted_bis['isbudgeted13'].median(), 'isbudgeted14': compass_df_Instrument2_isbudgeted_bis['isbudgeted14'].median(), 'isbudgeted15': compass_df_Instrument2_isbudgeted_bis['isbudgeted15'].median(), 
           'isbudgeted16': compass_df_Instrument2_isbudgeted_bis['isbudgeted16'].median(), 'isbudgeted17': compass_df_Instrument2_isbudgeted_bis['isbudgeted17'].median(), 'isbudgeted18': compass_df_Instrument2_isbudgeted_bis['isbudgeted18'].median(), 'isbudgeted668': compass_df_Instrument2_isbudgeted_bis['isbudgeted668'].median(), 'isbudgeted669': compass_df_Instrument2_isbudgeted_bis['isbudgeted669'].median(), 'isbudgeted800': compass_df_Instrument2_isbudgeted_bis['isbudgeted800'].median(), 'isbudgeted801': compass_df_Instrument2_isbudgeted_bis['isbudgeted801'].median()}


    # Compute mean
new_row2 = {'CountryLabel':'OECD Mean', 'OECD_STATUS': "Member", 'CountryCode':'OECD2', 'Total': compass_df_Instrument2_bis['Total'].mean(), 'Unique': compass_df_Instrument2_bis['Unique'].mean(), 'F40:V7': compass_df_Instrument2_bis['F40:V7'].mean(), 
           'F40:V8': compass_df_Instrument2_bis['F40:V8'].mean(), 'F40:V9': compass_df_Instrument2_bis['F40:V9'].mean(), 'F40:V10': compass_df_Instrument2_bis['F40:V10'].mean(), 'F40:V11': compass_df_Instrument2_bis['F40:V11'].mean(), 'F40:V12': compass_df_Instrument2_bis['F40:V12'].mean(), 'F40:V13': compass_df_Instrument2_bis['F40:V13'].mean(), 
           'F40:V14': compass_df_Instrument2_bis['F40:V14'].mean(), 'F40:V15': compass_df_Instrument2_bis['F40:V15'].mean(), 'F40:V16': compass_df_Instrument2_bis['F40:V16'].mean(), 'F40:V17': compass_df_Instrument2_bis['F40:V17'].mean(), 'F40:V18': compass_df_Instrument2_bis['F40:V18'].mean(), 'F40:V668': compass_df_Instrument2_bis['F40:V668'].mean(), 'F40:V800': compass_df_Instrument2_bis['F40:V800'].mean(), 'F40:V801': compass_df_Instrument2_bis['F40:V801'].mean(), 
           'F40:V669': compass_df_Instrument2_bis['F40:V669'].mean(), 'isbudgeted': compass_df_Instrument2_bis['isbudgeted'].mean()}
new_row2_isbudgeted = {'CountryLabel':'OECD Mean', 'OECD_STATUS': "Member", 'CountryCode':'OECD2', 'isbudgeted7': compass_df_Instrument2_isbudgeted_bis['isbudgeted7'].mean(), 'isbudgeted8': compass_df_Instrument2_isbudgeted_bis['isbudgeted8'].mean(), 'isbudgeted9': compass_df_Instrument2_isbudgeted_bis['isbudgeted9'].mean(), 
           'isbudgeted10': compass_df_Instrument2_isbudgeted_bis['isbudgeted10'].mean(), 'isbudgeted11': compass_df_Instrument2_isbudgeted_bis['isbudgeted11'].mean(), 'isbudgeted12': compass_df_Instrument2_isbudgeted_bis['isbudgeted12'].mean(), 'isbudgeted13': compass_df_Instrument2_isbudgeted_bis['isbudgeted13'].mean(), 'isbudgeted14': compass_df_Instrument2_isbudgeted_bis['isbudgeted14'].mean(), 'isbudgeted15': compass_df_Instrument2_isbudgeted_bis['isbudgeted15'].mean(), 
           'isbudgeted16': compass_df_Instrument2_isbudgeted_bis['isbudgeted16'].mean(), 'isbudgeted17': compass_df_Instrument2_isbudgeted_bis['isbudgeted17'].mean(), 'isbudgeted18': compass_df_Instrument2_isbudgeted_bis['isbudgeted18'].mean(), 'isbudgeted668': compass_df_Instrument2_isbudgeted_bis['isbudgeted668'].mean(), 'isbudgeted669': compass_df_Instrument2_isbudgeted_bis['isbudgeted669'].mean(), 'isbudgeted800': compass_df_Instrument2_isbudgeted_bis['isbudgeted800'].mean(), 'isbudgeted801': compass_df_Instrument2_isbudgeted_bis['isbudgeted801'].mean()}

    # Add new mean/median lines
compass_df_Instrument2 = compass_df_Instrument2.append(new_row2, ignore_index=True)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.append(new_row2_isbudgeted, ignore_index=True)


    # Lets redo the total as OECD median total isnt correct
compass_df_Instrument2['Total'] = compass_df_Instrument2['F40:V7'] + compass_df_Instrument2['F40:V8'] + compass_df_Instrument2['F40:V9'] + compass_df_Instrument2['F40:V10'] + compass_df_Instrument2['F40:V11'] + compass_df_Instrument2['F40:V12'] + compass_df_Instrument2['F40:V14'] + compass_df_Instrument2['F40:V13'] + compass_df_Instrument2['F40:V15'] + compass_df_Instrument2['F40:V16'] + compass_df_Instrument2['F40:V17'] + compass_df_Instrument2['F40:V18'] + compass_df_Instrument2['F40:V668'] + compass_df_Instrument2['F40:V800'] + compass_df_Instrument2['F40:V801'] + compass_df_Instrument2['F40:V669']

    # Get rid of OECD_STATUS (not needed any more)
compass_df_Instrument2 = compass_df_Instrument2.drop(['OECD_STATUS'], axis=1).copy()
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['OECD_STATUS'], axis=1).copy()


# generate alternative country label with funded initiative / total initiative (for both tabled)
compass_df_Instrument2_Label = compass_df_Instrument2[['CountryLabel', 'Unique', 'isbudgeted']].copy()
compass_df_Instrument2_Label['isbudgeted'] = compass_df_Instrument2_Label['isbudgeted'] * compass_df_Instrument2_Label['Unique']
compass_df_Instrument2_Label['CountryLabel2'] = compass_df_Instrument2_Label['CountryLabel']

    # for table compass_df_Instrument2
for x in range(0, compass_df_Instrument2_Label.shape[0]):
    if compass_df_Instrument2_Label["CountryLabel"].loc[x] == "OECD Mean":
        #compass_df_Instrument2_Label['CountryLabel2'].loc[x] = compass_df_Instrument2_Label['CountryLabel'].loc[x] + " (" +  (round(compass_df_Instrument2_Label['isbudgeted'].loc[x],2).astype(float)).astype(str) + " / " +  (round(compass_df_Instrument2_Label['Unique'].loc[x],2).astype(float)).astype(str) + ")"
        compass_df_Instrument2_Label['CountryLabel2'].loc[x] = compass_df_Instrument2_Label['CountryLabel'].loc[x] + " (" +  (round(compass_df_Instrument2_Label['Unique'].loc[x],2).astype(float)).astype(str) + ")"
    else:
        compass_df_Instrument2_Label['CountryLabel2'].loc[x] = compass_df_Instrument2_Label['CountryLabel'].loc[x] + " (" +  (compass_df_Instrument2_Label['Unique'].loc[x].astype(int)).astype(str) + ")"
compass_df_Instrument2_Label = compass_df_Instrument2_Label.drop(['Unique'], axis=1).copy()


# Prepare the dataframe for graph inclusion:

    # Due to lack of data, some countries must be removed: remove COUNTRY < 3 initiatives:

index_names = compass_df_Instrument2[(compass_df_Instrument2['Unique']) < 3 ].index
compass_df_Instrument2 = compass_df_Instrument2.drop(index_names, inplace = False)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(index_names, inplace = False)

    # Obtain the data in percent instead:
for x in range(2, compass_df_Instrument2.shape[1]-2):
       compass_df_Instrument2.iloc[:,x] = 100* compass_df_Instrument2.iloc[:,x]/ compass_df_Instrument2['Total']


        # Select the countries to analyse
compass_df_Instrument2 = compass_df_Instrument2.loc[compass_df_Instrument2["CountryCode"].isin(list(my_countries["CountryCode"]))]
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.loc[compass_df_Instrument2_isbudgeted["CountryCode"].isin(list(my_countries["CountryCode"]))]

    # Remove the fields 'Total', 'Unique' and 'isbudgeted' (not really necessary)
compass_df_Instrument2 = compass_df_Instrument2.drop(['Total', 'Unique', 'isbudgeted'], axis=1).copy()

# Rank the factors:
    # Remove the CountryCode:OECD (to exclude it from TOP computation) and store them in another dataframe (we'll need them again)
compass_df_Instrument2_filtered_OECD = compass_df_Instrument2.iloc[compass_df_Instrument2.shape[0]-1:,:]
index_names = compass_df_Instrument2[(compass_df_Instrument2['CountryCode'] == 'OECD') | (compass_df_Instrument2['CountryCode'] == 'OECD2')].index
compass_df_Instrument2.drop(index_names, inplace = True)
compass_df_Instrument2 = compass_df_Instrument2.drop(['CountryCode'], axis=1).copy()
compass_df_Instrument2_filtered_OECD = compass_df_Instrument2_filtered_OECD.drop(['CountryCode'], axis=1).copy()
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['CountryCode'], axis=1).copy()

list_mean_order = {'CountryLabel':'TEST', 'F40:V10': compass_df_Instrument2_bis['F40:V10'].mean(), 'F40:V11': compass_df_Instrument2_bis['F40:V11'].mean(), 'F40:V12': compass_df_Instrument2_bis['F40:V12'].mean(), 'F40:V13': compass_df_Instrument2_bis['F40:V13'].mean(), 
                   'F40:V14': compass_df_Instrument2_bis['F40:V14'].mean(), 'F40:V15': compass_df_Instrument2_bis['F40:V15'].mean(), 'F40:V16': compass_df_Instrument2_bis['F40:V16'].mean(), 'F40:V17': compass_df_Instrument2_bis['F40:V17'].mean(), 'F40:V18': compass_df_Instrument2_bis['F40:V18'].mean(), 'F40:V668': compass_df_Instrument2_bis['F40:V668'].mean(), 
                   'F40:V669': compass_df_Instrument2_bis['F40:V669'].mean(), 'F40:V7': compass_df_Instrument2_bis['F40:V7'].mean(), 'F40:V8': compass_df_Instrument2_bis['F40:V8'].mean(), 'F40:V800': compass_df_Instrument2_bis['F40:V800'].mean(), 'F40:V801': compass_df_Instrument2_bis['F40:V801'].mean(), 'F40:V9': compass_df_Instrument2_bis['F40:V9'].mean()}
list_mean_order_isbudgeted = {'CountryLabel':'TEST', 'isbudgeted10': compass_df_Instrument2_bis['F40:V10'].mean(), 'isbudgeted11': compass_df_Instrument2_bis['F40:V11'].mean(), 'isbudgeted12': compass_df_Instrument2_bis['F40:V12'].mean(), 'isbudgeted13': compass_df_Instrument2_bis['F40:V13'].mean(), 
                   'isbudgeted14': compass_df_Instrument2_bis['F40:V14'].mean(), 'isbudgeted15': compass_df_Instrument2_bis['F40:V15'].mean(), 'isbudgeted16': compass_df_Instrument2_bis['F40:V16'].mean(), 'isbudgeted17': compass_df_Instrument2_bis['F40:V17'].mean(), 'isbudgeted18': compass_df_Instrument2_bis['F40:V18'].mean(), 'isbudgeted668': compass_df_Instrument2_bis['F40:V668'].mean(), 
                   'isbudgeted669': compass_df_Instrument2_bis['F40:V669'].mean(), 'isbudgeted7': compass_df_Instrument2_bis['F40:V7'].mean(), 'isbudgeted8': compass_df_Instrument2_bis['F40:V8'].mean(), 'isbudgeted800': compass_df_Instrument2_bis['F40:V800'].mean(), 'isbudgeted801': compass_df_Instrument2_bis['F40:V801'].mean(), 'isbudgeted9': compass_df_Instrument2_bis['F40:V9'].mean()}

compass_df_Instrument2 = compass_df_Instrument2.append(list_mean_order, ignore_index=True)
compass_df_Instrument2 = compass_df_Instrument2.append(compass_df_Instrument2_filtered_OECD, ignore_index=True)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.append(list_mean_order_isbudgeted, ignore_index=True)

# sort both dataframes
    # sort table compass_df_Instrument2_isbudgeted
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.T
new_header = compass_df_Instrument2_isbudgeted.iloc[0]
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted[1:]
compass_df_Instrument2_isbudgeted.columns = new_header
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.astype(float)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.sort_values(by='TEST', ascending=False).reset_index(inplace = False)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['TEST'], axis=1).copy()

    # sort table compass_df_Instrument2
compass_df_Instrument2 = compass_df_Instrument2.T
new_header = compass_df_Instrument2.iloc[0]
compass_df_Instrument2 = compass_df_Instrument2[1:]
compass_df_Instrument2.columns = new_header
compass_df_Instrument2 = compass_df_Instrument2.astype(float)
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='TEST', ascending=False).reset_index(inplace = False)
compass_df_Instrument2 = compass_df_Instrument2.drop(['TEST'], axis=1).copy()

# use the index to create a variable called axislocation
My_position = compass_df_Instrument2.copy()
My_position = My_position.reset_index()
My_position = My_position[["level_0", "index"]]
My_position.rename(columns={'level_0': 'axislocation'}, inplace=True)

# melt the two dataframes
compass_df_Instrument3 = compass_df_Instrument2_isbudgeted.copy()
new_header3 = list(compass_df_Instrument2_isbudgeted.columns[1:].values)
compass_df_Instrument3 = compass_df_Instrument3.melt(id_vars=["index"], value_vars=new_header3, var_name='Variable', value_name='Color')

compass_df_Instrument4 = compass_df_Instrument2.copy()
new_header4 = list(compass_df_Instrument4.columns[1:].values)
compass_df_Instrument4 = compass_df_Instrument4.melt(id_vars=["index"], value_vars=new_header4, var_name='Variable', value_name='Value')

compass_df_Instrument2 = compass_df_Instrument4.copy()
compass_df_Instrument2["Color"] = compass_df_Instrument3["Color"]

    # add the axislocation to compass_df_Instrument2

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      My_position, 
                      on ='index', 
                      how ='inner')

    # Rename the columns from Dict_Headers name

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      Dict_Headers, 
                      left_on=['index'],
                      right_on=['Code'],
                      how ='inner')    
My_position = pd.merge(My_position, 
                      Dict_Headers, 
                      left_on=['index'],
                      right_on=['Code'],
                      how ='inner')
compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                        compass_df_Instrument2_Label, 
                        left_on ='Variable', 
                        right_on=['CountryLabel'],
                        how ='inner')
compass_df_Instrument2['Color'] = compass_df_Instrument2['Color'].fillna(0)

compass_df_Instrument2 = compass_df_Instrument2.drop(['Code', 'CountryLabel', 'Variable', 'isbudgeted'], axis=1).copy()
compass_df_Instrument2.rename(columns={'CountryLabel2': 'CountryLabel'}, inplace=True)


# Generate the graph:

import plotly.graph_objects as go

Value = compass_df_Instrument2["Value"]

fig = go.Figure(data=[go.Scatter(
    x= compass_df_Instrument2["axislocation"],
    y= compass_df_Instrument2["CountryLabel"],
    text=compass_df_Instrument2["Value"].map(lambda n: '{:,.2%}'.format(n/100)),
    texttemplate = "{0:.2f}%",
    mode='markers',
    marker=dict(
        size=Value,
        line=dict(width=2,color='DarkSlateGrey'),
        sizemode='area',
        sizeref=2.*max(Value)/(40.**2),
        sizemin=0,
        cmax=1,
        cmin=0,
        color=compass_df_Instrument2["Color"],
        colorscale='YlOrRd',
        showscale=True,
        colorbar=dict(
            title="Instrument funding data:",
            titleside="top",
            tickmode="array",
            tickvals=[0.1, 0.5, 0.9],
            ticktext=["None reported", "Reported for half <br>of instruments", "All reported"],
            tickfont=dict(family="verdana", color='#2a3f5f', size=10),
            title_font_family="verdana",
            title_font_size=13,
            title_font_color='#2a3f5f')
    )
)])

fig.update_layout(
    title={
        'text': "Figure 7: Allocation criteria for institutional funding for public research <br> as a share of total instruments reported (budget-weighted)",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis=dict(
        showgrid=False,
        showline=True,
        linecolor='rgb(102, 102, 102)',
        tickfont_color='rgb(102, 102, 102)',
        showticklabels=True,
        dtick=0.5,
        tickformat=".0%",
        ticks='outside',
        tickcolor='rgb(102, 102, 102)',
    ),
    margin=dict(l=140, r=40, b=50, t=100),
    legend=dict(
        font_size=10,
        yanchor='middle',
        xanchor='left',
    ),
    width=1000,
    height=750,
    paper_bgcolor='white',
    plot_bgcolor='white',
    hovermode='closest',
)

fig.update_yaxes(
    tickfont=dict(family="verdana", color='#2a3f5f', size=13),
    ticks="inside",
    tickson="boundaries",
    tickcolor="#444",
    gridwidth=1,
    gridcolor="#444",
    showgrid=True
)

fig.update_xaxes(
    title_text='Allocation criteria considered',
    title_font_family="verdana",
    title_font_size=17,
    title_font_color='#2a3f5f',
    fixedrange=True,
    range=(-0.5, 15.5),
    ticktext= My_position['Long Description'],
    tickvals=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
    showgrid=False,
    gridwidth=1,
    gridcolor="#444",
    ticks="outside",
    tickson="boundaries",
    tickcolor="#444",
    tickfont=dict(family="verdana", color='#2a3f5f', size=13)
)

fig.show()        

Allocation of institutional funding for public research is made using a wide array of criteria. Research impact is the main one on average, followed by research publications and outputs, scientific partnerships and collaborations, and student enrolment or attainment rates. Some country preferences for certain criteria are apparent in the figure. Canada gears towards alignment with research impacts, followed by national research priorities. In Canada, “other” is particularly used to indicate that allocations are determined at the federal level. Japan is particularly geared towards research impacts, followed by scientific partnerships and collaborations. Besides research impacts, Germany favours research publications and the budgets allocated in previous years as funding criteria. Compared to other countries, Korea and France use a more diverse set of allocation criteria.

4.2. Project grants for public research

This second series of graphs looks at project grants for public research. These are direct allocations of funding to individual researchers or public research organisations seeking to finance all or part of a research project. They include grant schemes that vary from simplistic, one-off funding allocations to complex strategic programs built on formal public-private partnerships.

4.2.1. Type of research activity funded

Project grants for public research can support different kinds of activities. The STIP Compass dataset distinguishes between applied research, basic research, multidisciplinary research, experimental development and demonstration/testing. Figure 8 maps the activities funded by research grants in the selected five countries and the OECD mean. More than one activity can be targeted by a given grant instrument. The frequency of each activity is aggregated using the NBW coefficients. The resulting frequencies are then divided by the total weighted grant instruments reported by the country, to obtain the relative prevalence of each research activity. For each country, each bubble size is proportional to the corresponding research activity's share in the overall pool of weighted project grants for public research.

Show Code
# Data preparation:

    # Copy the central dataframe
compass_df_Instrument = compass_df.copy()

    # Filter on initiatives linked to "Institutional funding for public research":
compass_df_Instrument = compass_df_Instrument[(compass_df_Instrument.InstrumentTypeLabel == "Project grants for public research")]


    # Generate a table of initiatives by time-period for each country
compass_df_Instrument = pd.pivot_table(data=compass_df_Instrument, index=['CountryLabel', 'OECD_STATUS', 'CountryCode', 'InstrumentTypeLabel', 'InstrumentID'], values=['NBW', 'isbudgeted', 'F76:V33', 'F76:V34', 'F76:V35', 'F76:V804', 'F76:V805'], aggfunc={'NBW': pd.Series.mean, 'isbudgeted': pd.Series.mean, 'F76:V33': sum, 'F76:V34': sum, 'F76:V35': sum, 'F76:V804': sum, 'F76:V805': sum}, fill_value=0).reset_index()

    # Add an extra column to account for missing timeline:
compass_df_Instrument["F76:V31"] = 0

for x in range(5, compass_df_Instrument.shape[1]-3):
    compass_df_Instrument["F76:V31"] = compass_df_Instrument["F76:V31"] + compass_df_Instrument.iloc[:,x]

for x in range(0, compass_df_Instrument.shape[0]):
    if compass_df_Instrument["F76:V31"].loc[x] != 0:
        compass_df_Instrument["F76:V31"].loc[x]  = 0
    else:
        compass_df_Instrument["F76:V31"].loc[x]  = 1
        # Note: we've added "F76:V31" to the dictionary earlier

    # Reorder the columns to have variables on the right:
compass_df_Instrument = compass_df_Instrument.reindex(columns=['OECD_STATUS', 'CountryLabel', 'CountryCode', 'InstrumentID', 'NBW', 'isbudgeted', 'F76:V33', 'F76:V34', 'F76:V35', 'F76:V804', 'F76:V805', 'F76:V31'])

compass_df_Instrument['isbudgeted33'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F76:V33']
compass_df_Instrument['isbudgeted34'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F76:V34']
compass_df_Instrument['isbudgeted35'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F76:V35']
compass_df_Instrument['isbudgeted804'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F76:V804']
compass_df_Instrument['isbudgeted805'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F76:V805']
compass_df_Instrument['isbudgeted31'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F76:V31']
compass_df_Instrument['total33'] = compass_df_Instrument['F76:V33']
compass_df_Instrument['total34'] = compass_df_Instrument['F76:V34']
compass_df_Instrument['total35'] = compass_df_Instrument['F76:V35']
compass_df_Instrument['total804'] = compass_df_Instrument['F76:V804']
compass_df_Instrument['total805'] = compass_df_Instrument['F76:V805']
compass_df_Instrument['total31'] = compass_df_Instrument['F76:V31']

# Weight each data by NBW
for x in range(6, compass_df_Instrument.shape[1]-12):
    compass_df_Instrument.iloc[:,x] = compass_df_Instrument.iloc[:,x] * compass_df_Instrument['NBW']

# Aggregate the data by countries
compass_df_Instrument = pd.pivot_table(data=compass_df_Instrument, index=['CountryLabel', 'OECD_STATUS', 'CountryCode'], values=['isbudgeted', 'InstrumentID', 'isbudgeted33', 'isbudgeted34', 'isbudgeted35', 'isbudgeted804', 'isbudgeted805', 'isbudgeted31', 'total33', 'total34', 'total35', 'total804', 'total805','total31','F76:V33','F76:V34', 'F76:V35','F76:V804','F76:V805','F76:V31'], aggfunc={'isbudgeted': sum, 'InstrumentID':pd.Series.nunique, 'isbudgeted33': sum, 'isbudgeted34': sum, 'isbudgeted35': sum, 'isbudgeted804': sum, 'isbudgeted805': sum, 'isbudgeted31': sum, 'total33': sum, 'total34': sum, 'total35': sum, 'total804': sum, 'total805': sum,'total31': sum,'F76:V33': sum,'F76:V34': sum, 'F76:V35': sum,'F76:V804': sum,'F76:V805': sum,'F76:V31': sum}, margins=True, margins_name='total2', fill_value=0).reset_index()

index_names = compass_df_Instrument[(compass_df_Instrument['OECD_STATUS'] == '')].index
compass_df_Instrument.drop(index_names, inplace = True)
       
compass_df_Instrument['isbudgeted33'] = compass_df_Instrument['isbudgeted33'] / compass_df_Instrument['total33'] 
compass_df_Instrument['isbudgeted34'] = compass_df_Instrument['isbudgeted34'] / compass_df_Instrument['total34'] 
compass_df_Instrument['isbudgeted35'] = compass_df_Instrument['isbudgeted35'] / compass_df_Instrument['total35'] 
compass_df_Instrument['isbudgeted804'] = compass_df_Instrument['isbudgeted804'] / compass_df_Instrument['total804'] 
compass_df_Instrument['isbudgeted805'] = compass_df_Instrument['isbudgeted805'] / compass_df_Instrument['total805'] 
compass_df_Instrument['isbudgeted31'] = compass_df_Instrument['isbudgeted31'] / compass_df_Instrument['total31'] 


# Get rid of intermediate totals (not needed any more)
compass_df_Instrument = compass_df_Instrument.drop(['total33', 'total34', 'total35', 'total804', 'total805', 'total31'], axis=1).copy()

# Generate a dataframe filtered on OECD member only:
compass_df_Instrument_bis = compass_df_Instrument[compass_df_Instrument.OECD_STATUS == "Member"]
new_row = {'CountryLabel':'OECD Mean', 'OECD_STATUS': 'Member', 'CountryCode':'OECD2', 'F76:V31': round(compass_df_Instrument_bis['F76:V31'].mean(),1), 'F76:V33': round(compass_df_Instrument_bis['F76:V33'].mean(),1), 'F76:V34': round(compass_df_Instrument_bis['F76:V34'].mean(),1), 'F76:V35': round(compass_df_Instrument_bis['F76:V35'].mean(),1), 'F76:V804': compass_df_Instrument_bis['F76:V804'].mean(), 'F76:V805': compass_df_Instrument_bis['F76:V805'].mean(), 
           'isbudgeted': compass_df_Instrument_bis['isbudgeted'].mean(), 'isbudgeted31': compass_df_Instrument_bis['isbudgeted31'].mean(), 'isbudgeted33': compass_df_Instrument_bis['isbudgeted33'].mean(), 'isbudgeted34': compass_df_Instrument_bis['isbudgeted34'].mean(), 'isbudgeted35': compass_df_Instrument_bis['isbudgeted35'].mean(), 'isbudgeted804': compass_df_Instrument_bis['isbudgeted804'].mean(), 'isbudgeted805': compass_df_Instrument_bis['isbudgeted805'].mean(), 'InstrumentID': compass_df_Instrument_bis['InstrumentID'].mean()}
new_row2 = {'CountryLabel':'OECD Median', 'OECD_STATUS': 'Member', 'CountryCode':'OECD', 'F76:V31': round(compass_df_Instrument_bis['F76:V31'].median(),1), 'F76:V33': round(compass_df_Instrument_bis['F76:V33'].median(),1), 'F76:V34': round(compass_df_Instrument_bis['F76:V34'].median(),1), 'F76:V35': round(compass_df_Instrument_bis['F76:V35'].median(),1), 'F76:V804': compass_df_Instrument_bis['F76:V804'].median(), 'F76:V805': compass_df_Instrument_bis['F76:V805'].median(), 
           'isbudgeted': compass_df_Instrument_bis['isbudgeted'].mean(), 'isbudgeted31': compass_df_Instrument_bis['isbudgeted31'].median(), 'isbudgeted33': compass_df_Instrument_bis['isbudgeted33'].median(), 'isbudgeted34': compass_df_Instrument_bis['isbudgeted34'].median(), 'isbudgeted35': compass_df_Instrument_bis['isbudgeted35'].median(), 'isbudgeted804': compass_df_Instrument_bis['isbudgeted804'].median(), 'isbudgeted805': compass_df_Instrument_bis['isbudgeted805'].median(), 'InstrumentID': compass_df_Instrument_bis['InstrumentID'].median()}

# Append the OECD data to the dataframes
compass_df_Instrument = compass_df_Instrument.append(new_row, ignore_index=True)


# Generate new labels dataframe
compass_df_Instrument_Label = compass_df_Instrument[['CountryLabel', 'isbudgeted', 'InstrumentID']].copy()
compass_df_Instrument_Label['CountryLabel2'] = compass_df_Instrument_Label['CountryLabel']


compass_df_Instrument_Label['isbudgeted'] = compass_df_Instrument_Label['InstrumentID'] * compass_df_Instrument_Label['isbudgeted']

for x in range(0, compass_df_Instrument_Label.shape[0]):
    if compass_df_Instrument_Label["CountryLabel"].loc[x] == "OECD Mean":
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " (" + (round(compass_df_Instrument_Label['InstrumentID'].loc[x],1).astype(float)).astype(str) + ") "
    else:
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " ("  + (compass_df_Instrument_Label['InstrumentID'].loc[x].astype(int)).astype(str) + ") "
           
# Prepare the dataframe for graph inclusion:
    # Reorder the columns to have variables on the right:
compass_df_Instrument2 = compass_df_Instrument.reindex(columns=['OECD_STATUS', 'CountryLabel', 'CountryCode', 'InstrumentID', 'isbudgeted', 'isbudgeted33', 'isbudgeted34', 'isbudgeted31', 'isbudgeted35', 'isbudgeted804', 'isbudgeted805','F76:V31', 'F76:V33', 'F76:V34', 'F76:V35', 'F76:V804', 'F76:V805'])

    # generate alternative country label with funded initiative / total initiative (for both tabled)
compass_df_Instrument2_Label = compass_df_Instrument2[['CountryLabel', 'InstrumentID', 'isbudgeted']].copy()
compass_df_Instrument2_Label['isbudgeted'] = compass_df_Instrument2_Label['isbudgeted']
compass_df_Instrument2_Label['CountryLabel2'] = compass_df_Instrument2_Label['CountryLabel']

    # for table compass_df_Instrument2
for x in range(0, compass_df_Instrument2_Label.shape[0]):
    if compass_df_Instrument2_Label["CountryLabel"].loc[x] == "OECD Mean":
        compass_df_Instrument2_Label['CountryLabel2'].loc[x] = compass_df_Instrument2_Label['CountryLabel'].loc[x] + " (" +  (round(compass_df_Instrument2_Label['isbudgeted'].loc[x],2).astype(float)).astype(str) + " / " +  (round(compass_df_Instrument2_Label['InstrumentID'].loc[x],2).astype(float)).astype(str) + ")"
    else:
        compass_df_Instrument2_Label['CountryLabel2'].loc[x] = compass_df_Instrument2_Label['CountryLabel'].loc[x] + " (" +  (compass_df_Instrument2_Label['isbudgeted'].loc[x].astype(int)).astype(str) + " / " +  (compass_df_Instrument2_Label['InstrumentID'].loc[x].astype(int)).astype(str) + ")"

# The dataframe obtained shows counts of initiatives - Compute the proportion instead
compass_df_Instrument2["F76:V32"] = 0
for x in range(11, compass_df_Instrument2.shape[1]-1):
    compass_df_Instrument2["F76:V32"] = compass_df_Instrument2["F76:V32"] + compass_df_Instrument2.iloc[:,x]
for x in range(11, compass_df_Instrument2.shape[1]):
    compass_df_Instrument2.iloc[:,x] = compass_df_Instrument2.iloc[:,x] / compass_df_Instrument2["F76:V32"] #compass_df_Instrument.iloc[:,compass_df_Instrument.shape[1]]   

# Select the countries to analyse
compass_df_Instrument2 = compass_df_Instrument2.loc[compass_df_Instrument["CountryCode"].isin(list(my_countries["CountryCode"]))]

    # Trim the dataframe to only core data and transpose it
compass_df_Instrument2 = compass_df_Instrument2[['CountryLabel', 'CountryCode', 'InstrumentID', 'isbudgeted', 'isbudgeted31', 'isbudgeted33', 'isbudgeted34', 'isbudgeted35', 'isbudgeted804', 'isbudgeted805','F76:V31', 'F76:V33', 'F76:V34', 'F76:V35', 'F76:V804', 'F76:V805']].copy()

    # rename the InstrumentID to total
compass_df_Instrument2 = compass_df_Instrument2.rename(columns={"InstrumentID":"Total"})

    # Aggregate data by countries in two distinct dataframe
compass_df_Instrument2_isbudgeted = pd.pivot_table(data=compass_df_Instrument2, index=['CountryLabel', 'CountryCode'], values=['isbudgeted31', 'isbudgeted33', 'isbudgeted34', 'isbudgeted35', 'isbudgeted804', 'isbudgeted805'], aggfunc={'isbudgeted31': sum, 'isbudgeted33': sum, 'isbudgeted34': sum, 'isbudgeted35': sum, 'isbudgeted804': sum, 'isbudgeted805': sum}, fill_value=0).reset_index()
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.sort_values(by='CountryLabel', ascending=True)

compass_df_Instrument2 = pd.pivot_table(data=compass_df_Instrument2, index=['CountryLabel', 'CountryCode'], values=['Total', 'isbudgeted', 'F76:V31', 'F76:V33', 'F76:V34', 'F76:V35', 'F76:V804', 'F76:V805'], aggfunc={'Total': sum, 'isbudgeted': pd.Series.mean, 'F76:V31': sum, 'F76:V33': sum, 'F76:V34': sum, 'F76:V35': sum, 'F76:V804': sum, 'F76:V805': sum}, fill_value=0).reset_index()
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='CountryLabel', ascending=True)
compass_df_Instrument2['Total'] = compass_df_Instrument2['F76:V31'] + compass_df_Instrument2['F76:V33'] + compass_df_Instrument2['F76:V34'] + compass_df_Instrument2['F76:V35'] + compass_df_Instrument2['F76:V804'] + compass_df_Instrument2['F76:V805']

# Prepare the dataframe for graph inclusion:

    # Due to lack of data, some countries must be removed: remove COUNTRY < 3 initiatives:

index_names = compass_df_Instrument2[(compass_df_Instrument2['isbudgeted']) < 3 ].index
compass_df_Instrument2 = compass_df_Instrument2.drop(index_names, inplace = False)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(index_names, inplace = False)

    # Obtain the data in percent instead:
for x in range(2, compass_df_Instrument2.shape[1]-2):
       compass_df_Instrument2.iloc[:,x] = 100* compass_df_Instrument2.iloc[:,x]/ compass_df_Instrument2['Total']


# Rank the factors:
    # Remove the CountryCode:OECD (to exclude it from TOP computation) and store them in another dataframe (we'll need them again)
compass_df_Instrument2_filtered_OECD = compass_df_Instrument2.iloc[compass_df_Instrument2.shape[0]-1:,:]
index_names = compass_df_Instrument2[(compass_df_Instrument2['CountryCode'] == 'OECD') | (compass_df_Instrument2['CountryCode'] == 'OECD2')].index
compass_df_Instrument2.drop(index_names, inplace = True)

list_mean_order = {'CountryLabel':'TEST', 'F76:V31': compass_df_Instrument2['F76:V31'].mean(), 'F76:V33': compass_df_Instrument2['F76:V33'].mean(), 'F76:V34': compass_df_Instrument2['F76:V34'].mean(), 'F76:V35': compass_df_Instrument2['F76:V35'].mean(), 
                   'F76:V804': compass_df_Instrument2['F76:V804'].mean(), 'F76:V805': compass_df_Instrument2['F76:V805'].mean()}

list_mean_order_isbudgeted = {'CountryLabel':'TEST', 'isbudgeted31': compass_df_Instrument2['F76:V31'].mean(), 'isbudgeted33': compass_df_Instrument2['F76:V33'].mean(), 'isbudgeted34': compass_df_Instrument2['F76:V34'].mean(), 'isbudgeted35': compass_df_Instrument2['F76:V35'].mean(), 
                   'isbudgeted804': compass_df_Instrument2['F76:V804'].mean(), 'isbudgeted805': compass_df_Instrument2['F76:V805'].mean()}

compass_df_Instrument2 = compass_df_Instrument2.append(list_mean_order, ignore_index=True)
compass_df_Instrument2 = compass_df_Instrument2.append(compass_df_Instrument2_filtered_OECD, ignore_index=True)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.append(list_mean_order_isbudgeted, ignore_index=True)

compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['CountryCode'], axis=1).copy()
compass_df_Instrument2 = compass_df_Instrument2.drop(['CountryCode', 'isbudgeted', 'Total'], axis=1).copy()

# sort both dataframes
    # sort table compass_df_Instrument2_isbudgeted
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.T
new_header = compass_df_Instrument2_isbudgeted.iloc[0]
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted[1:]
compass_df_Instrument2_isbudgeted.columns = new_header
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.astype(float)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.sort_values(by='TEST', ascending=False).reset_index(inplace = False)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['TEST'], axis=1).copy()

    # sort table compass_df_Instrument2
compass_df_Instrument2 = compass_df_Instrument2.T
new_header = compass_df_Instrument2.iloc[0]
compass_df_Instrument2 = compass_df_Instrument2[1:]
compass_df_Instrument2.columns = new_header
compass_df_Instrument2 = compass_df_Instrument2.astype(float)
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='TEST', ascending=False).reset_index(inplace = False)
compass_df_Instrument2 = compass_df_Instrument2.drop(['TEST'], axis=1).copy()

# use the index to create a variable called axislocation
My_position = compass_df_Instrument2.copy()
My_position = My_position.reset_index()
My_position = My_position[["level_0", "index"]]
My_position.rename(columns={'level_0': 'axislocation'}, inplace=True)


# melt the two dataframes
compass_df_Instrument3 = compass_df_Instrument2_isbudgeted.copy()
new_header3 = list(compass_df_Instrument2_isbudgeted.columns[1:].values)
compass_df_Instrument3 = compass_df_Instrument3.melt(id_vars=["index"], value_vars=new_header3, var_name='Variable', value_name='Color')

compass_df_Instrument4 = compass_df_Instrument2.copy()
new_header4 = list(compass_df_Instrument4.columns[1:].values)
compass_df_Instrument4 = compass_df_Instrument4.melt(id_vars=["index"], value_vars=new_header4, var_name='Variable', value_name='Value')

compass_df_Instrument2 = compass_df_Instrument4.copy()
compass_df_Instrument2["Color"] = compass_df_Instrument3["Color"]

    # add the axislocation to compass_df_Instrument2

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      My_position, 
                      on ='index', 
                      how ='inner')

    # Rename the columns from Dict_Headers name

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      Dict_Headers, 
                      left_on=['index'],
                      right_on=['Code'],
                      how ='inner')    
My_position = pd.merge(My_position, 
                      Dict_Headers, 
                      left_on=['index'],
                      right_on=['Code'],
                      how ='inner')
compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                        compass_df_Instrument_Label, 
                        left_on ='Variable', 
                        right_on=['CountryLabel'],
                        how ='inner')
compass_df_Instrument2['Color'] = compass_df_Instrument2['Color'].fillna(0)

compass_df_Instrument2 = compass_df_Instrument2.drop(['Code', 'CountryLabel', 'Variable', 'isbudgeted'], axis=1).copy()
compass_df_Instrument2.rename(columns={'CountryLabel2': 'CountryLabel'}, inplace=True)



# Generate the graph:

import plotly.graph_objects as go
compass_df_Instrument2["Value"] = compass_df_Instrument2["Value"]/100
Value = compass_df_Instrument2["Value"]

fig = go.Figure(data=[go.Scatter(
    x= compass_df_Instrument2["axislocation"],
    y= compass_df_Instrument2["CountryLabel"],
    text=compass_df_Instrument2["Value"].map(lambda n: '{:,.2%}'.format(n)),
    texttemplate = "{0:.2f}%",
    mode='markers',
    marker=dict(
        size=Value,
        line=dict(width=2,color='DarkSlateGrey'),
        sizemode='area',
        sizeref=2.*max(Value)/(40.**2),
        sizemin=0,
        cmax=1,
        cmin=0,
        color=compass_df_Instrument2["Color"],
        colorscale='YlOrRd',
        showscale=True,
        colorbar=dict(
            title="Instrument funding data:",
            titleside="top",
            tickmode="array",
            tickvals=[0.1, 0.5, 0.9],
            ticktext=["None reported", "Reported for half <br>of instruments", "All reported"],
            tickfont=dict(family="verdana", color='#2a3f5f', size=10),
            title_font_family="verdana",
            title_font_size=13,
            title_font_color='#2a3f5f')
    )
)])

fig.update_layout(
    title={
        'text': "Figure 8: Type of research activity funded by project grants for public research  <br> as a share of total instruments reported (budget-weighted)",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis=dict(
        showgrid=False,
        showline=True,
        linecolor='rgb(102, 102, 102)',
        tickfont_color='rgb(102, 102, 102)',
        showticklabels=True,
        dtick=0.5,
        tickformat=".0%",
        ticks='outside',
        tickcolor='rgb(102, 102, 102)',
    ),
    margin=dict(l=140, r=40, b=50, t=100),
    legend=dict(
        font_size=10,
        yanchor='middle',
        xanchor='left',
    ),
    width=800,
    height=550,
    paper_bgcolor='white',
    plot_bgcolor='white',
    hovermode='closest',
)

fig.update_yaxes(
    tickfont=dict(family="verdana", color='#2a3f5f', size=13),
    ticks="inside",
    tickson="boundaries",
    tickcolor="#444",
    gridwidth=1,
    gridcolor="#444",
    showgrid=True
)

fig.update_xaxes(
    title_text='Type of research activity funded',
    title_font_family="verdana",
    title_font_size=17,
    title_font_color='#2a3f5f',
    fixedrange=True,
    range=(min(My_position["axislocation"])-1, max(My_position["axislocation"]) +1 ),
    ticktext=My_position['Long Description'], 
    tickvals=My_position["axislocation"],
    showgrid=False,
    gridwidth=1,
    gridcolor="#444",
    ticks="outside",
    tickson="boundaries",
    tickcolor="#444",
    tickfont=dict(family="verdana", color='#2a3f5f', size=13)
)
fig.show()  

In the OECD area, applied research is the main funded activity, followed closely by basic research. Four out of five of the selected countries follow this same pattern, with the exception of France, where basic research is relatively more funded than applied research. The prevalence of basic research funding seems lower in Japan compared to others. Multidisciplinary research stands out as a prevalent activity in Japan, Germany and Korea in relation to those of other countries. Experimental development and demonstration/testing are cited more often in France and Korea.

4.2.2. Forms of collaboration

Figure 9 focuses on the types of collaborations required to allocate grants for public research. These grants at times require the participation of industry partners, international partners, users of research outputs, or other public research actors. . The frequency of each form of collaboration is aggregated using the NBW coefficients. The resulting frequencies are then divided by the total weighted grant instruments reported by the country, to obtain the relative prevalence of each type of collaboration. For each country, each bubble size is proportional to the collaboration's share in the overall pool of weighted project grants for public research.

Show Code
# Data preparation:

    # Copy the central dataframe
compass_df_Instrument = compass_df.copy()

    # Filter on initiatives linked to "Institutional funding for public research":
compass_df_Instrument = compass_df_Instrument[(compass_df_Instrument.InstrumentTypeLabel == "Project grants for public research")]

#Aggregate data by InstrumentID
compass_df_Instrument = pd.pivot_table(data=compass_df_Instrument, index=['CountryLabel', 'OECD_STATUS', 'CountryCode', 'InstrumentID'], values=['NBW', 'isbudgeted', 'F77:V37', 'F77:V38', 'F77:V39', 'F77:V40', 'F77:V806', 'F77:V41'], aggfunc={'NBW': sum, 'isbudgeted': pd.Series.mean, 'F77:V37':sum, 'F77:V38':sum, 'F77:V39':sum, 'F77:V40':sum, 'F77:V806':sum, 'F77:V41': sum}, fill_value=0).reset_index()

    # Add an extra column to account for missing values:
compass_df_Instrument["F77:V31"] = 0

for x in range(5, compass_df_Instrument.shape[1]-3):
    compass_df_Instrument["F77:V31"] = compass_df_Instrument["F77:V31"] + compass_df_Instrument.iloc[:,x]

for x in range(0, compass_df_Instrument.shape[0]):
    if compass_df_Instrument["F77:V31"].loc[x] != 0:
        compass_df_Instrument["F77:V31"].loc[x]  = 0
    else:
        compass_df_Instrument["F77:V31"].loc[x]  = 1
        # Note: we've added "F77:V31" to the dictionary earlier

    # Reorder the columns to have variables on the right:
compass_df_Instrument = compass_df_Instrument.reindex(columns=['OECD_STATUS', 'CountryLabel', 'CountryCode', 'InstrumentID', 'NBW', 'isbudgeted', 'F77:V37', 'F77:V38', 'F77:V39', 'F77:V40', 'F77:V41', 'F77:V806'])
compass_df_Instrument['isbudgeted37'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F77:V37']
compass_df_Instrument['isbudgeted38'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F77:V38']
compass_df_Instrument['isbudgeted39'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F77:V39']
compass_df_Instrument['isbudgeted40'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F77:V40']
compass_df_Instrument['isbudgeted41'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F77:V41']
compass_df_Instrument['isbudgeted806'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F77:V806']
compass_df_Instrument['total37'] = compass_df_Instrument['F77:V37']
compass_df_Instrument['total38'] = compass_df_Instrument['F77:V38']
compass_df_Instrument['total39'] = compass_df_Instrument['F77:V39']
compass_df_Instrument['total40'] = compass_df_Instrument['F77:V40']
compass_df_Instrument['total41'] = compass_df_Instrument['F77:V41']
compass_df_Instrument['total806'] = compass_df_Instrument['F77:V806']

# Weight each data by NBW
for x in range(6, compass_df_Instrument.shape[1]-12):
    compass_df_Instrument.iloc[:,x] = compass_df_Instrument.iloc[:,x] * compass_df_Instrument['NBW']

# Aggregate the data by countries
compass_df_Instrument = pd.pivot_table(data=compass_df_Instrument, index=['CountryLabel', 'OECD_STATUS', 'CountryCode'], values=['isbudgeted', 'InstrumentID', 'isbudgeted37', 'isbudgeted38', 'isbudgeted39', 'isbudgeted40', 'isbudgeted41', 'isbudgeted806', 'F77:V37', 'F77:V38', 'F77:V39', 'F77:V40', 'F77:V41', 'F77:V806', 'total37', 'total38', 'total39', 'total40', 'total41', 'total806'], aggfunc={'isbudgeted': sum, 'InstrumentID':pd.Series.nunique, 'isbudgeted37': sum, 'isbudgeted38': sum, 'isbudgeted39': sum, 'isbudgeted40': sum, 'isbudgeted41': sum, 'isbudgeted806': sum, 'F77:V37': sum, 'F77:V38': sum, 'F77:V39': sum, 'F77:V40': sum, 'F77:V41': sum, 'F77:V806': sum, 'total37': sum, 'total38': sum, 'total39': sum, 'total40': sum, 'total41': sum, 'total806': sum}, margins=True, margins_name='total2', fill_value=0).reset_index()
index_names = compass_df_Instrument[(compass_df_Instrument['OECD_STATUS'] == '')].index
compass_df_Instrument.drop(index_names, inplace = True)

compass_df_Instrument['isbudgeted37'] = compass_df_Instrument['isbudgeted37'] / compass_df_Instrument['total37'] 
compass_df_Instrument['isbudgeted38'] = compass_df_Instrument['isbudgeted38'] / compass_df_Instrument['total38'] 
compass_df_Instrument['isbudgeted39'] = compass_df_Instrument['isbudgeted39'] / compass_df_Instrument['total39'] 
compass_df_Instrument['isbudgeted40'] = compass_df_Instrument['isbudgeted40'] / compass_df_Instrument['total40'] 
compass_df_Instrument['isbudgeted41'] = compass_df_Instrument['isbudgeted41'] / compass_df_Instrument['total41'] 
compass_df_Instrument['isbudgeted806'] = compass_df_Instrument['isbudgeted806'] / compass_df_Instrument['total806'] 

# Get rid of intermediate totals (not needed any more)
compass_df_Instrument = compass_df_Instrument.drop(['total37', 'total38', 'total39', 'total40', 'total41', 'total806'], axis=1).copy()

# Generate a dataframe filtered on OECD member only:
compass_df_Instrument_bis = compass_df_Instrument[compass_df_Instrument.OECD_STATUS == "Member"]
new_row = {'CountryLabel':'OECD Mean', 'OECD_STATUS': 'Member', 'CountryCode':'OECD2', 'F77:V37': round(compass_df_Instrument_bis['F77:V37'].mean(),1), 'F77:V38': round(compass_df_Instrument_bis['F77:V38'].mean(),1), 'F77:V39': round(compass_df_Instrument_bis['F77:V39'].mean(),1), 'F77:V40': round(compass_df_Instrument_bis['F77:V40'].mean(),1), 'F77:V41': compass_df_Instrument_bis['F77:V41'].mean(), 'F77:V806': compass_df_Instrument_bis['F77:V806'].mean(), 
           'isbudgeted': compass_df_Instrument_bis['isbudgeted'].mean(), 'isbudgeted37': compass_df_Instrument_bis['isbudgeted37'].mean(), 'isbudgeted38': compass_df_Instrument_bis['isbudgeted38'].mean(), 'isbudgeted39': compass_df_Instrument_bis['isbudgeted39'].mean(), 'isbudgeted40': compass_df_Instrument_bis['isbudgeted40'].mean(), 'isbudgeted41': compass_df_Instrument_bis['isbudgeted41'].mean(), 'isbudgeted806': compass_df_Instrument_bis['isbudgeted806'].mean(), 'InstrumentID': compass_df_Instrument_bis['InstrumentID'].mean()}
new_row2 = {'CountryLabel':'OECD Median', 'OECD_STATUS': 'Member', 'CountryCode':'OECD', 'F77:V37': round(compass_df_Instrument_bis['F77:V37'].median(),1), 'F77:V38': round(compass_df_Instrument_bis['F77:V38'].median(),1), 'F77:V39': round(compass_df_Instrument_bis['F77:V39'].median(),1), 'F77:V40': round(compass_df_Instrument_bis['F77:V40'].median(),1), 'F77:V41': compass_df_Instrument_bis['F77:V41'].median(), 'F77:V806': compass_df_Instrument_bis['F77:V806'].median(), 
           'isbudgeted': compass_df_Instrument_bis['isbudgeted'].mean(), 'isbudgeted37': compass_df_Instrument_bis['isbudgeted37'].median(), 'isbudgeted38': compass_df_Instrument_bis['isbudgeted38'].median(), 'isbudgeted39': compass_df_Instrument_bis['isbudgeted39'].median(), 'isbudgeted40': compass_df_Instrument_bis['isbudgeted40'].median(), 'isbudgeted41': compass_df_Instrument_bis['isbudgeted41'].median(), 'isbudgeted806': compass_df_Instrument_bis['isbudgeted806'].median(), 'InstrumentID': compass_df_Instrument_bis['InstrumentID'].median()}

# Append the OECD data to the dataframes
compass_df_Instrument = compass_df_Instrument.append(new_row, ignore_index=True)

# Generate new labels dataframe
compass_df_Instrument_Label = compass_df_Instrument[['CountryLabel', 'isbudgeted', 'InstrumentID']].copy()
compass_df_Instrument_Label['CountryLabel2'] = compass_df_Instrument_Label['CountryLabel']
compass_df_Instrument_Label['isbudgeted'] = compass_df_Instrument_Label['InstrumentID'] * compass_df_Instrument_Label['isbudgeted']

for x in range(0, compass_df_Instrument_Label.shape[0]):
    if compass_df_Instrument_Label["CountryLabel"].loc[x] == "OECD Mean":
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " (" + (round(compass_df_Instrument_Label['InstrumentID'].loc[x],1).astype(float)).astype(str) + ") "
    else:
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " ("  + (compass_df_Instrument_Label['InstrumentID'].loc[x].astype(int)).astype(str) + ") "     

# Prepare the dataframe for graph inclusion:
    # Reorder the columns to have variables on the right:
compass_df_Instrument = compass_df_Instrument.reindex(columns=['OECD_STATUS', 'CountryLabel', 'CountryCode', 'InstrumentID', 'isbudgeted', 'isbudgeted37', 'isbudgeted38', 'isbudgeted39', 'isbudgeted40', 'isbudgeted41', 'isbudgeted806','F77:V37', 'F77:V38', 'F77:V39', 'F77:V40', 'F77:V41', 'F77:V806'])

    # generate alternative country label with funded initiative / total initiative (for both tabled)
compass_df_Instrument2_Label = compass_df_Instrument[['CountryLabel', 'InstrumentID', 'isbudgeted']].copy()
compass_df_Instrument2_Label['isbudgeted'] = compass_df_Instrument2_Label['isbudgeted']
compass_df_Instrument2_Label['CountryLabel2'] = compass_df_Instrument2_Label['CountryLabel']
for x in range(0, compass_df_Instrument2_Label.shape[0]):
    if compass_df_Instrument2_Label["CountryLabel"].loc[x] == "OECD Mean":
        compass_df_Instrument2_Label['CountryLabel2'].loc[x] = compass_df_Instrument2_Label['CountryLabel'].loc[x] + " (" +  (round(compass_df_Instrument2_Label['isbudgeted'].loc[x],2).astype(float)).astype(str) + " / " +  (round(compass_df_Instrument2_Label['InstrumentID'].loc[x],2).astype(float)).astype(str) + ")"
    else:
        compass_df_Instrument2_Label['CountryLabel2'].loc[x] = compass_df_Instrument2_Label['CountryLabel'].loc[x] + " (" +  (compass_df_Instrument2_Label['isbudgeted'].loc[x].astype(int)).astype(str) + " / " +  (compass_df_Instrument2_Label['InstrumentID'].loc[x].astype(int)).astype(str) + ")"

    # The dataframe obtained shows counts of initiatives for each time horizon - Compute the proportion instead
compass_df_Instrument["F77:V32"] = 0
for x in range(11, compass_df_Instrument.shape[1]-1):
    compass_df_Instrument["F77:V32"] = compass_df_Instrument["F77:V32"] + compass_df_Instrument.iloc[:,x]
for x in range(11, compass_df_Instrument.shape[1]):
    compass_df_Instrument.iloc[:,x] = compass_df_Instrument.iloc[:,x] / compass_df_Instrument["F77:V32"] #compass_df_Instrument.iloc[:,compass_df_Instrument.shape[1]]

    # Select the countries to analyse
compass_df_Instrument2 = compass_df_Instrument.loc[compass_df_Instrument["CountryCode"].isin(list(my_countries["CountryCode"]))]

    # Trim the dataframe to only core data
compass_df_Instrument2 = compass_df_Instrument2[['CountryLabel', 'CountryCode', 'InstrumentID', 'isbudgeted', 'isbudgeted37', 'isbudgeted38', 'isbudgeted39', 'isbudgeted40', 'isbudgeted41', 'isbudgeted806','F77:V37', 'F77:V38', 'F77:V39', 'F77:V40', 'F77:V41', 'F77:V806']].copy()

    # rename the InstrumentID to total
compass_df_Instrument2 = compass_df_Instrument2.rename(columns={"InstrumentID":"Total"})

    # Aggregate data by countries in two distinct dataframe
compass_df_Instrument2_isbudgeted = pd.pivot_table(data=compass_df_Instrument2, index=['CountryLabel', 'CountryCode'], values=['isbudgeted37', 'isbudgeted38', 'isbudgeted39', 'isbudgeted40', 'isbudgeted41', 'isbudgeted806'], aggfunc={'isbudgeted37': sum, 'isbudgeted38': sum, 'isbudgeted39': sum, 'isbudgeted40': sum, 'isbudgeted41': sum, 'isbudgeted806': sum}, fill_value=0).reset_index()
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.sort_values(by='CountryLabel', ascending=True)

compass_df_Instrument2 = pd.pivot_table(data=compass_df_Instrument2, index=['CountryLabel', 'CountryCode'], values=['Total', 'isbudgeted', 'F77:V37', 'F77:V38', 'F77:V39', 'F77:V40', 'F77:V41', 'F77:V806'], aggfunc={'Total': sum, 'isbudgeted': pd.Series.mean, 'F77:V37': sum, 'F77:V38': sum, 'F77:V39': sum, 'F77:V40': sum, 'F77:V41': sum, 'F77:V806': sum}, fill_value=0).reset_index()
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='CountryLabel', ascending=True)
compass_df_Instrument2['Total'] = compass_df_Instrument2['F77:V37'] + compass_df_Instrument2['F77:V38'] + compass_df_Instrument2['F77:V39'] + compass_df_Instrument2['F77:V40'] + compass_df_Instrument2['F77:V41'] + compass_df_Instrument2['F77:V806']

# Prepare the dataframe for graph inclusion:

    # Due to lack of data, some countries must be removed: remove COUNTRY < 3 initiatives:

index_names = compass_df_Instrument2[(compass_df_Instrument2['isbudgeted']) < 3 ].index
compass_df_Instrument2 = compass_df_Instrument2.drop(index_names, inplace = False)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(index_names, inplace = False)

    # Obtain the data in percent instead:
for x in range(2, compass_df_Instrument2.shape[1]-2):
       compass_df_Instrument2.iloc[:,x] = 100* compass_df_Instrument2.iloc[:,x]/ compass_df_Instrument2['Total']

# Rank the factors:
    # Remove the CountryCode:OECD (to exclude it from TOP computation) and store them in another dataframe (we'll need them again)
compass_df_Instrument2_filtered_OECD = compass_df_Instrument2.iloc[compass_df_Instrument2.shape[0]-1:,:]
index_names = compass_df_Instrument2[(compass_df_Instrument2['CountryCode'] == 'OECD') | (compass_df_Instrument2['CountryCode'] == 'OECD2')].index
compass_df_Instrument2.drop(index_names, inplace = True)

list_mean_order = {'CountryLabel':'TEST', 'F77:V37': compass_df_Instrument2['F77:V37'].mean(), 'F77:V38': compass_df_Instrument2['F77:V38'].mean(), 'F77:V39': compass_df_Instrument2['F77:V39'].mean(), 'F77:V40': compass_df_Instrument2['F77:V40'].mean(), 
                   'F77:V41': compass_df_Instrument2['F77:V41'].mean(), 'F77:V806': compass_df_Instrument2['F77:V806'].mean()}

list_mean_order_isbudgeted = {'CountryLabel':'TEST', 'isbudgeted37': compass_df_Instrument2['F77:V37'].mean(), 'isbudgeted38': compass_df_Instrument2['F77:V38'].mean(), 'isbudgeted39': compass_df_Instrument2['F77:V39'].mean(), 'isbudgeted40': compass_df_Instrument2['F77:V40'].mean(), 
                   'isbudgeted41': compass_df_Instrument2['F77:V41'].mean(), 'isbudgeted806': compass_df_Instrument2['F77:V806'].mean()}

compass_df_Instrument2 = compass_df_Instrument2.append(list_mean_order, ignore_index=True)
compass_df_Instrument2 = compass_df_Instrument2.append(compass_df_Instrument2_filtered_OECD, ignore_index=True)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.append(list_mean_order_isbudgeted, ignore_index=True)

compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['CountryCode'], axis=1).copy()
compass_df_Instrument2 = compass_df_Instrument2.drop(['CountryCode', 'isbudgeted', 'Total'], axis=1).copy()

# sort both dataframes
    # sort table compass_df_Instrument2_isbudgeted
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.T
new_header = compass_df_Instrument2_isbudgeted.iloc[0]
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted[1:]
compass_df_Instrument2_isbudgeted.columns = new_header
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.astype(float)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.sort_values(by='TEST', ascending=False).reset_index(inplace = False)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['TEST'], axis=1).copy()

    # sort table compass_df_Instrument2
compass_df_Instrument2 = compass_df_Instrument2.T
new_header = compass_df_Instrument2.iloc[0]
compass_df_Instrument2 = compass_df_Instrument2[1:]
compass_df_Instrument2.columns = new_header
compass_df_Instrument2 = compass_df_Instrument2.astype(float)
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='TEST', ascending=False).reset_index(inplace = False)
compass_df_Instrument2 = compass_df_Instrument2.drop(['TEST'], axis=1).copy()

# use the index to create a variable called axislocation
My_position = compass_df_Instrument2.copy()
My_position = My_position.reset_index()
My_position = My_position[["level_0", "index"]]
My_position.rename(columns={'level_0': 'axislocation'}, inplace=True)


# melt the two dataframes
compass_df_Instrument3 = compass_df_Instrument2_isbudgeted.copy()
new_header3 = list(compass_df_Instrument2_isbudgeted.columns[1:].values)
compass_df_Instrument3 = compass_df_Instrument3.melt(id_vars=["index"], value_vars=new_header3, var_name='Variable', value_name='Color')

compass_df_Instrument4 = compass_df_Instrument2.copy()
new_header4 = list(compass_df_Instrument4.columns[1:].values)
compass_df_Instrument4 = compass_df_Instrument4.melt(id_vars=["index"], value_vars=new_header4, var_name='Variable', value_name='Value')

compass_df_Instrument2 = compass_df_Instrument4.copy()
compass_df_Instrument2["Color"] = compass_df_Instrument3["Color"]

    # add the axislocation to compass_df_Instrument2

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      My_position, 
                      on ='index', 
                      how ='inner')

    # Rename the columns from Dict_Headers name

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      Dict_Headers, 
                      left_on=['index'],
                      right_on=['Code'],
                      how ='inner')    
My_position = pd.merge(My_position, 
                      Dict_Headers, 
                      left_on=['index'],
                      right_on=['Code'],
                      how ='inner')
compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                        compass_df_Instrument_Label, 
                        left_on ='Variable', 
                        right_on=['CountryLabel'],
                        how ='inner')
compass_df_Instrument2['Color'] = compass_df_Instrument2['Color'].fillna(0)
compass_df_Instrument2['Value'] = compass_df_Instrument2['Value']/100

compass_df_Instrument2 = compass_df_Instrument2.drop(['Code', 'CountryLabel', 'Variable', 'isbudgeted'], axis=1).copy()
compass_df_Instrument2.rename(columns={'CountryLabel2': 'CountryLabel'}, inplace=True)



# Generate the graph:

import plotly.graph_objects as go

Value = compass_df_Instrument2["Value"]

fig = go.Figure(data=[go.Scatter(
    x= compass_df_Instrument2["axislocation"],
    y= compass_df_Instrument2["CountryLabel"],
    text=compass_df_Instrument2["Value"].map(lambda n: '{:,.2%}'.format(n)),
    texttemplate = "{0:.2f}%",
    mode='markers',
    marker=dict(
        size=Value,
        line=dict(width=2,color='DarkSlateGrey'),
        sizemode='area',
        sizeref=2.*max(Value)/(40.**2),
        sizemin=0,
        cmax=1,
        cmin=0,
        color=compass_df_Instrument2["Color"],
        colorscale='YlOrRd',
        showscale=True,
        colorbar=dict(
            title="Instrument funding data:",
            titleside="top",
            tickmode="array",
            tickvals=[0.1, 0.5, 0.9],
            ticktext=["None reported", "Reported for half <br>of instruments", "All reported"],
            tickfont=dict(family="verdana", color='#2a3f5f', size=10),
            title_font_family="verdana",
            title_font_size=13,
            title_font_color='#2a3f5f')
    )
)])

fig.update_layout(
    title={
        'text': "Figure 9: Type of research activity funded by project grants for public research  <br> as a share of total instruments reported (budget-weighted)",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis=dict(
        showgrid=False,
        showline=True,
        linecolor='rgb(102, 102, 102)',
        tickfont_color='rgb(102, 102, 102)',
        showticklabels=True,
        dtick=0.5,
        tickformat=".0%",
        ticks='outside',
        tickcolor='rgb(102, 102, 102)',
    ),
    margin=dict(l=140, r=40, b=50, t=100),
    legend=dict(
        font_size=10,
        yanchor='middle',
        xanchor='left',
    ),
    width=800,
    height=550,
    paper_bgcolor='white',
    plot_bgcolor='white',
    hovermode='closest',
)

fig.update_yaxes(
    tickfont=dict(family="verdana", color='#2a3f5f', size=13),
    ticks="inside",
    tickson="boundaries",
    tickcolor="#444",
    gridwidth=1,
    gridcolor="#444",
    showgrid=True
)

fig.update_xaxes(
    title_text='Type of research activity funded',
    title_font_family="verdana",
    title_font_size=17,
    title_font_color='#2a3f5f',
    fixedrange=True,
    range=(min(My_position["axislocation"])-1, max(My_position["axislocation"]) +1 ),
    ticktext=My_position['Long Description'],
    tickvals=My_position["axislocation"],
    showgrid=False,
    gridwidth=1,
    gridcolor="#444",
    ticks="outside",
    tickson="boundaries",
    tickcolor="#444",
    tickfont=dict(family="verdana", color='#2a3f5f', size=13)
)
fig.show()  

Collaboration is a common requirement in project grants for public research across countries, both for the selected five countries and the OECD mean. The different forms of collaborations tend to be balanced in the database, with the exceptions of users of research output and with other partners, which are relatively more marginal. As a requirement, collaboration with international partners is relatively more prevalent in Japan, whereas Germany and Canada tend to promote collaboration within public research system actors. Compared to other countries, France display more support for project grants involving industry partners.

4.2.3. Selection criteria

Figure 10 displays the prevalence of selection criteria for the allocation of research grants. These grants are allocated on a competitive basis, selecting from a number of candidate projects on the basis of certain favoured parameters (e.g. societal impact anticipated and alignment with national research priorities. Using the same approach as above, the chart aggregates the weighted policy instrument data reported by the five selected countries to indicate the prevalence of each criterion among the total pool of research grants reported.

Show Code
# Data preparation:

    # Copy the central dataframe
compass_df_Instrument = compass_df.copy()

# Filter on initiatives linked to "Institutional funding for public research"
compass_df_Instrument = compass_df_Instrument[(compass_df_Instrument.InstrumentTypeLabel == "Project grants for public research")]

    # Aggregate data by InstrumentID
compass_df_Instrument = pd.pivot_table(data=compass_df_Instrument, index=['CountryLabel', 'OECD_STATUS', 'CountryCode', 'InstrumentID'], values=['NBW', 'isbudgeted', 'F78:V43', 'F78:V44', 'F78:V45', 'F78:V46', 'F78:V807', 'F78:V808', 'F78:V47', 'F78:V48', 'F78:V49', 'F78:V50'], aggfunc={'NBW': pd.Series.mean, 'isbudgeted': pd.Series.mean, 'F78:V43': sum, 'F78:V44': sum, 'F78:V45': sum, 'F78:V46': sum, 'F78:V807': sum, 'F78:V808': sum, 'F78:V47': sum, 'F78:V48': sum, 'F78:V49': sum, 'F78:V50': sum}, margins=True, margins_name='total', fill_value=0).reset_index()

    # Add an extra column to account for missing values:
compass_df_Instrument["F78:V31"] = 0

for x in range(5, compass_df_Instrument.shape[1]-3):
    compass_df_Instrument["F78:V31"] = compass_df_Instrument["F78:V31"] + compass_df_Instrument.iloc[:,x]

for x in range(0, compass_df_Instrument.shape[0]):
    if compass_df_Instrument["F78:V31"].loc[x] != 0:
        compass_df_Instrument["F78:V31"].loc[x]  = 0
    else:
        compass_df_Instrument["F78:V31"].loc[x]  = 1


    # introduce 2 sets of variables, "isbudgetedXX" and "totalXX"
compass_df_Instrument['isbudgeted43'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V43']
compass_df_Instrument['isbudgeted44'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V44']
compass_df_Instrument['isbudgeted45'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V45']
compass_df_Instrument['isbudgeted46'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V46']
compass_df_Instrument['isbudgeted807'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V807']
compass_df_Instrument['isbudgeted808'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V808']
compass_df_Instrument['isbudgeted47'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V47']
compass_df_Instrument['isbudgeted48'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V48']
compass_df_Instrument['isbudgeted49'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V49']
compass_df_Instrument['isbudgeted50'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V50']
compass_df_Instrument['isbudgeted31'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F78:V31']
compass_df_Instrument['total43'] = compass_df_Instrument['F78:V43']
compass_df_Instrument['total44'] = compass_df_Instrument['F78:V44']
compass_df_Instrument['total45'] = compass_df_Instrument['F78:V45']
compass_df_Instrument['total46'] = compass_df_Instrument['F78:V46']
compass_df_Instrument['total807'] = compass_df_Instrument['F78:V807']
compass_df_Instrument['total808'] = compass_df_Instrument['F78:V808']
compass_df_Instrument['total47'] = compass_df_Instrument['F78:V47']
compass_df_Instrument['total48'] = compass_df_Instrument['F78:V48']
compass_df_Instrument['total49'] = compass_df_Instrument['F78:V49']
compass_df_Instrument['total50'] = compass_df_Instrument['F78:V50']
compass_df_Instrument['total31'] = compass_df_Instrument['F78:V31']

    # Weight initiatives by NBW
for x in range(4, (compass_df_Instrument.shape[1])-25):
    compass_df_Instrument.iloc[:,x] = compass_df_Instrument.iloc[:,x] * compass_df_Instrument['NBW']

# Aggregate the data by countries
compass_df_Instrument = pd.pivot_table(data=compass_df_Instrument, index=['CountryLabel', 'OECD_STATUS', 'CountryCode'], values=['isbudgeted', 'InstrumentID', 'isbudgeted43', 'isbudgeted44', 'isbudgeted45', 'isbudgeted46', 'isbudgeted807', 'isbudgeted808', 'isbudgeted47', 'isbudgeted48', 'isbudgeted49', 'isbudgeted50', 'isbudgeted31', 'F78:V43', 'F78:V44', 'F78:V45', 'F78:V46', 'F78:V807', 'F78:V808', 'F78:V47', 'F78:V48', 'F78:V49', 'F78:V50', 'F78:V31', 'total43', 'total44', 'total45', 'total46', 'total807', 'total808', 'total47', 'total48', 'total49', 'total50', 'total31'], aggfunc={'isbudgeted': sum, 'InstrumentID':pd.Series.nunique, 'isbudgeted43': sum, 'isbudgeted44': sum, 'isbudgeted45': sum, 'isbudgeted46': sum, 'isbudgeted807': sum, 'isbudgeted808': sum, 'isbudgeted47': sum, 'isbudgeted48': sum, 'isbudgeted49': sum, 'isbudgeted50': sum, 'isbudgeted31': sum, 'F78:V43': sum, 'F78:V44': sum, 'F78:V45': sum, 'F78:V46': sum, 'F78:V807': sum, 'F78:V808': sum, 'F78:V47': sum, 'F78:V48': sum, 'F78:V49': sum, 'F78:V50': sum, 'F78:V31': sum, 'total43': sum, 'total44': sum, 'total45': sum, 'total46': sum, 'total807': sum, 'total808': sum, 'total47': sum, 'total48': sum, 'total49': sum, 'total50': sum, 'total31': sum}, margins=True, margins_name='total2', fill_value=0).reset_index()
index_names = compass_df_Instrument[(compass_df_Instrument['OECD_STATUS'] == '')].index
compass_df_Instrument.drop(index_names, inplace = True)

compass_df_Instrument['isbudgeted43'] = compass_df_Instrument['isbudgeted43'] / compass_df_Instrument['total43']
compass_df_Instrument['isbudgeted44'] = compass_df_Instrument['isbudgeted44'] / compass_df_Instrument['total44']
compass_df_Instrument['isbudgeted45'] = compass_df_Instrument['isbudgeted45'] / compass_df_Instrument['total45']
compass_df_Instrument['isbudgeted46'] = compass_df_Instrument['isbudgeted46'] / compass_df_Instrument['total46']
compass_df_Instrument['isbudgeted807'] = compass_df_Instrument['isbudgeted807'] / compass_df_Instrument['total807']
compass_df_Instrument['isbudgeted808'] = compass_df_Instrument['isbudgeted808'] / compass_df_Instrument['total808']
compass_df_Instrument['isbudgeted47'] = compass_df_Instrument['isbudgeted47'] / compass_df_Instrument['total47']
compass_df_Instrument['isbudgeted48'] = compass_df_Instrument['isbudgeted48'] / compass_df_Instrument['total48']
compass_df_Instrument['isbudgeted49'] = compass_df_Instrument['isbudgeted49'] / compass_df_Instrument['total49']
compass_df_Instrument['isbudgeted50'] = compass_df_Instrument['isbudgeted50'] / compass_df_Instrument['total50']
compass_df_Instrument['isbudgeted31'] = compass_df_Instrument['isbudgeted31'] / compass_df_Instrument['total31']


# Get rid of intermediate totals (not needed any more)
compass_df_Instrument = compass_df_Instrument.drop(['total43', 'total44', 'total45', 'total46', 'total807', 'total808', 'total47', 'total48', 'total49', 'total50', 'total31'], axis=1).copy()

# Generate a dataframe filtered on OECD member only:
compass_df_Instrument_bis = compass_df_Instrument[compass_df_Instrument.OECD_STATUS == "Member"]

new_row = {'CountryLabel':'OECD Mean', 'OECD_STATUS': 'Member', 'CountryCode':'OECD2', 'InstrumentID': compass_df_Instrument_bis['InstrumentID'].mean(), 'F78:V31': compass_df_Instrument_bis['F78:V31'].mean(), 'F78:V43': compass_df_Instrument_bis['F78:V43'].mean(), 
           'F78:V44': compass_df_Instrument_bis['F78:V44'].mean(), 'F78:V45': compass_df_Instrument_bis['F78:V45'].mean(), 'F78:V46': compass_df_Instrument_bis['F78:V46'].mean(), 'F78:V807': compass_df_Instrument_bis['F78:V807'].mean(), 'F78:V808': compass_df_Instrument_bis['F78:V808'].mean(), 'F78:V47': compass_df_Instrument_bis['F78:V47'].mean(), 
           'F78:V48': compass_df_Instrument_bis['F78:V48'].mean(), 'F78:V49': compass_df_Instrument_bis['F78:V49'].mean(), 'F78:V50': compass_df_Instrument_bis['F78:V50'].mean(), 'isbudgeted31': compass_df_Instrument_bis['isbudgeted31'].mean(), 'isbudgeted43': compass_df_Instrument_bis['isbudgeted43'].mean(), 'isbudgeted44': compass_df_Instrument_bis['isbudgeted44'].mean(), 'isbudgeted45': compass_df_Instrument_bis['isbudgeted45'].mean(), 
           'isbudgeted46': compass_df_Instrument_bis['isbudgeted46'].mean(), 'isbudgeted807': compass_df_Instrument_bis['isbudgeted807'].mean(), 'isbudgeted808': compass_df_Instrument_bis['isbudgeted808'].mean(), 'isbudgeted47': compass_df_Instrument_bis['isbudgeted47'].mean(), 'isbudgeted48': compass_df_Instrument_bis['isbudgeted48'].mean(), 'isbudgeted49': compass_df_Instrument_bis['isbudgeted49'].mean(), 
           'isbudgeted50': compass_df_Instrument_bis['isbudgeted50'].mean(), 'isbudgeted': compass_df_Instrument_bis['isbudgeted'].mean()}

new_row2 = {'CountryLabel':'OECD Median', 'OECD_STATUS': 'Member', 'CountryCode':'OECD2', 'InstrumentID': compass_df_Instrument_bis['InstrumentID'].median(), 'F78:V31': compass_df_Instrument_bis['F78:V31'].median(), 'F78:V43': compass_df_Instrument_bis['F78:V43'].median(), 
           'F78:V44': compass_df_Instrument_bis['F78:V44'].median(), 'F78:V45': compass_df_Instrument_bis['F78:V45'].median(), 'F78:V46': compass_df_Instrument_bis['F78:V46'].median(), 'F78:V807': compass_df_Instrument_bis['F78:V807'].median(), 'F78:V808': compass_df_Instrument_bis['F78:V808'].median(), 'F78:V47': compass_df_Instrument_bis['F78:V47'].median(), 
           'F78:V48': compass_df_Instrument_bis['F78:V48'].median(), 'F78:V49': compass_df_Instrument_bis['F78:V49'].median(), 'F78:V50': compass_df_Instrument_bis['F78:V50'].median(), 'isbudgeted31': compass_df_Instrument_bis['isbudgeted31'].median(), 'isbudgeted43': compass_df_Instrument_bis['isbudgeted43'].median(), 'isbudgeted44': compass_df_Instrument_bis['isbudgeted44'].median(), 'isbudgeted45': compass_df_Instrument_bis['isbudgeted45'].median(), 
           'isbudgeted46': compass_df_Instrument_bis['isbudgeted46'].median(), 'isbudgeted807': compass_df_Instrument_bis['isbudgeted807'].median(), 'isbudgeted808': compass_df_Instrument_bis['isbudgeted808'].median(), 'isbudgeted47': compass_df_Instrument_bis['isbudgeted47'].median(), 'isbudgeted48': compass_df_Instrument_bis['isbudgeted48'].median(), 'isbudgeted49': compass_df_Instrument_bis['isbudgeted49'].median(), 
           'isbudgeted50': compass_df_Instrument_bis['isbudgeted50'].median(), 'isbudgeted': compass_df_Instrument_bis['isbudgeted'].median()}

# Append the OECD data to the dataframes
compass_df_Instrument = compass_df_Instrument.append(new_row, ignore_index=True)


# Generate new labels dataframe
compass_df_Instrument_Label = compass_df_Instrument[['CountryLabel', 'isbudgeted', 'InstrumentID']].copy()
compass_df_Instrument_Label['CountryLabel2'] = compass_df_Instrument_Label['CountryLabel']
compass_df_Instrument_Label['isbudgeted'] = compass_df_Instrument_Label['InstrumentID'] * compass_df_Instrument_Label['isbudgeted']
for x in range(0, compass_df_Instrument_Label.shape[0]):
    if compass_df_Instrument_Label["CountryLabel"].loc[x] == "OECD Mean":
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " (" + (round(compass_df_Instrument_Label['InstrumentID'].loc[x],1).astype(float)).astype(str) + ") "
    else:
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " ("  + (compass_df_Instrument_Label['InstrumentID'].loc[x].astype(int)).astype(str) + ") "     

compass_df_Instrument_Label
        
# Prepare the dataframe for graph inclusion:
# Reorder the columns to have variables on the right:
compass_df_Instrument = compass_df_Instrument.reindex(columns=['OECD_STATUS', 'CountryLabel', 'CountryCode', 'InstrumentID', 'isbudgeted', 'isbudgeted43', 'isbudgeted44', 'isbudgeted45', 'isbudgeted46', 'isbudgeted807', 'isbudgeted808', 'isbudgeted47', 'isbudgeted48', 'isbudgeted49', 'isbudgeted50', 'isbudgeted31', 'F78:V43', 'F78:V44', 'F78:V45', 'F78:V46', 'F78:V807', 'F78:V808', 'F78:V47', 'F78:V48', 'F78:V49', 'F78:V50', 'F78:V31'])

# The dataframe obtained shows counts of initiatives for each time horizon - Compute the proportion instead
compass_df_Instrument["F78:V32"] = 0
for x in range(16, compass_df_Instrument.shape[1]-1):
    compass_df_Instrument["F78:V32"] = compass_df_Instrument["F78:V32"] + compass_df_Instrument.iloc[:,x]
for x in range(16, compass_df_Instrument.shape[1]):
    compass_df_Instrument.iloc[:,x] = compass_df_Instrument.iloc[:,x] / compass_df_Instrument["F78:V32"] #compass_df_Instrument.iloc[:,compass_df_Instrument.shape[1]]

    # Select the countries to analyse
compass_df_Instrument2 = compass_df_Instrument.loc[compass_df_Instrument["CountryCode"].isin(list(my_countries["CountryCode"]))]

    # Trim the dataframe to only core data
compass_df_Instrument2 = compass_df_Instrument2[['CountryLabel', 'CountryCode', 'InstrumentID', 'isbudgeted', 'isbudgeted43', 'isbudgeted44', 'isbudgeted45', 'isbudgeted46', 'isbudgeted807', 'isbudgeted808', 'isbudgeted47', 'isbudgeted48', 'isbudgeted49', 'isbudgeted50', 'isbudgeted31', 'F78:V43', 'F78:V44', 'F78:V45', 'F78:V46', 'F78:V807', 'F78:V808', 'F78:V47', 'F78:V48', 'F78:V49', 'F78:V50', 'F78:V31']].copy()

    # rename the InstrumentID to total
compass_df_Instrument2 = compass_df_Instrument2.rename(columns={"InstrumentID":"Total"})

    # Aggregate data by countries in two distinct dataframe
compass_df_Instrument2_isbudgeted = pd.pivot_table(data=compass_df_Instrument2, index=['CountryLabel', 'CountryCode'], values=['isbudgeted43', 'isbudgeted44', 'isbudgeted45', 'isbudgeted46', 'isbudgeted807', 'isbudgeted808', 'isbudgeted47', 'isbudgeted48', 'isbudgeted49', 'isbudgeted50', 'isbudgeted31'], aggfunc={'isbudgeted43': sum, 'isbudgeted44': sum, 'isbudgeted45': sum, 'isbudgeted46': sum, 'isbudgeted807': sum, 'isbudgeted808': sum, 'isbudgeted47': sum, 'isbudgeted48': sum, 'isbudgeted49': sum, 'isbudgeted50': sum, 'isbudgeted31': sum}, fill_value=0).reset_index()
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.sort_values(by='CountryLabel', ascending=True)

compass_df_Instrument2 = pd.pivot_table(data=compass_df_Instrument2, index=['CountryLabel', 'CountryCode'], values=['Total', 'isbudgeted', 'F78:V43', 'F78:V44', 'F78:V45', 'F78:V46', 'F78:V807', 'F78:V808', 'F78:V47', 'F78:V48', 'F78:V49', 'F78:V50', 'F78:V31'], aggfunc={'Total': sum, 'isbudgeted': pd.Series.mean, 'F78:V43': sum, 'F78:V44': sum, 'F78:V45': sum, 'F78:V46': sum, 'F78:V807': sum, 'F78:V808': sum, 'F78:V47': sum, 'F78:V48': sum, 'F78:V49': sum, 'F78:V50': sum, 'F78:V31': sum}, fill_value=0).reset_index()
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='CountryLabel', ascending=True)
compass_df_Instrument2['Total'] = compass_df_Instrument2['F78:V43'] + compass_df_Instrument2['F78:V44'] + compass_df_Instrument2['F78:V45'] + compass_df_Instrument2['F78:V46'] + compass_df_Instrument2['F78:V807'] + compass_df_Instrument2['F78:V808'] + compass_df_Instrument2['F78:V47'] + compass_df_Instrument2['F78:V48'] + compass_df_Instrument2['F78:V49'] + compass_df_Instrument2['F78:V50'] + compass_df_Instrument2['F78:V31']

# Prepare the dataframe for graph inclusion:

    # Due to lack of data, some countries must be removed: remove COUNTRY < 3 initiatives:

index_names = compass_df_Instrument2[(compass_df_Instrument2['isbudgeted']) < 3 ].index
compass_df_Instrument2 = compass_df_Instrument2.drop(index_names, inplace = False)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(index_names, inplace = False)

    # Obtain the data in percent instead:
for x in range(2, compass_df_Instrument2.shape[1]-2):
       compass_df_Instrument2.iloc[:,x] = 100* compass_df_Instrument2.iloc[:,x]/ compass_df_Instrument2['Total']

# Rank the factors:
    # Remove the CountryCode:OECD (to exclude it from TOP computation) and store them in another dataframe (we'll need them again)
compass_df_Instrument2_filtered_OECD = compass_df_Instrument2.iloc[compass_df_Instrument2.shape[0]-1:,:]
index_names = compass_df_Instrument2[(compass_df_Instrument2['CountryCode'] == 'OECD') | (compass_df_Instrument2['CountryCode'] == 'OECD2')].index
compass_df_Instrument2.drop(index_names, inplace = True)

list_mean_order = {'CountryLabel':'TEST', 'F78:V31': compass_df_Instrument2['F78:V31'].mean(), 'F78:V43': compass_df_Instrument2['F78:V43'].mean(), 'F78:V44': compass_df_Instrument2['F78:V44'].mean(), 'F78:V45': compass_df_Instrument2['F78:V45'].mean(), 'F78:V46': compass_df_Instrument2['F78:V46'].mean(), 
                   'F78:V47': compass_df_Instrument2['F78:V47'].mean(), 'F78:V48': compass_df_Instrument2['F78:V48'].mean(), 'F78:V49': compass_df_Instrument2['F78:V49'].mean(), 'F78:V50': compass_df_Instrument2['F78:V50'].mean(), 'F78:V807': compass_df_Instrument2['F78:V807'].mean(), 'F78:V808': compass_df_Instrument2['F78:V808'].mean()}

list_mean_order_isbudgeted = {'CountryLabel':'TEST', 'isbudgeted31': compass_df_Instrument2['F78:V31'].mean(), 'isbudgeted43': compass_df_Instrument2['F78:V43'].mean(), 'isbudgeted44': compass_df_Instrument2['F78:V44'].mean(), 'isbudgeted45': compass_df_Instrument2['F78:V45'].mean(), 'isbudgeted46': compass_df_Instrument2['F78:V46'].mean(), 
                   'isbudgeted47': compass_df_Instrument2['F78:V47'].mean(), 'isbudgeted48': compass_df_Instrument2['F78:V48'].mean(), 'isbudgeted49': compass_df_Instrument2['F78:V49'].mean(), 'isbudgeted50': compass_df_Instrument2['F78:V50'].mean(), 'isbudgeted807': compass_df_Instrument2['F78:V807'].mean(), 'isbudgeted808': compass_df_Instrument2['F78:V808'].mean()}


compass_df_Instrument2 = compass_df_Instrument2.append(list_mean_order, ignore_index=True)
compass_df_Instrument2 = compass_df_Instrument2.append(compass_df_Instrument2_filtered_OECD, ignore_index=True)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.append(list_mean_order_isbudgeted, ignore_index=True)

compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['CountryCode'], axis=1).copy()
compass_df_Instrument2 = compass_df_Instrument2.drop(['CountryCode', 'isbudgeted', 'Total'], axis=1).copy()


# sort both dataframes
    # sort table compass_df_Instrument2_isbudgeted
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.T
new_header = compass_df_Instrument2_isbudgeted.iloc[0]
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted[1:]
compass_df_Instrument2_isbudgeted.columns = new_header
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.astype(float)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.sort_values(by='TEST', ascending=False).reset_index(inplace = False)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['TEST'], axis=1).copy()

    # sort table compass_df_Instrument2
compass_df_Instrument2 = compass_df_Instrument2.T
new_header = compass_df_Instrument2.iloc[0]
compass_df_Instrument2 = compass_df_Instrument2[1:]
compass_df_Instrument2.columns = new_header
compass_df_Instrument2 = compass_df_Instrument2.astype(float)
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='TEST', ascending=False).reset_index(inplace = False)
compass_df_Instrument2 = compass_df_Instrument2.drop(['TEST'], axis=1).copy()

# use the index to create a variable called axislocation
My_position = compass_df_Instrument2.copy()
My_position = My_position.reset_index()
My_position = My_position[["level_0", "index"]]
My_position.rename(columns={'level_0': 'axislocation'}, inplace=True)

# melt the two dataframes
compass_df_Instrument3 = compass_df_Instrument2_isbudgeted.copy()
new_header3 = list(compass_df_Instrument2_isbudgeted.columns[1:].values)
compass_df_Instrument3 = compass_df_Instrument3.melt(id_vars=["index"], value_vars=new_header3, var_name='Variable', value_name='Color')
compass_df_Instrument4 = compass_df_Instrument2.copy()
new_header4 = list(compass_df_Instrument4.columns[1:].values)
compass_df_Instrument4 = compass_df_Instrument4.melt(id_vars=["index"], value_vars=new_header4, var_name='Variable', value_name='Value')

compass_df_Instrument2 = compass_df_Instrument4.copy()
compass_df_Instrument2["Color"] = compass_df_Instrument3["Color"]

    # add the axislocation to compass_df_Instrument2

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      My_position, 
                      on ='index', 
                      how ='inner')

    # Rename the columns from Dict_Headers name

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      Dict_Headers, 
                      left_on=['index'],
                      right_on=['Code'],
                      how ='inner')    
My_position = pd.merge(My_position, 
                      Dict_Headers, 
                      left_on=['index'],
                      right_on=['Code'],
                      how ='inner')
compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                        compass_df_Instrument_Label, 
                        left_on ='Variable', 
                        right_on=['CountryLabel'],
                        how ='inner')
compass_df_Instrument2['Color'] = compass_df_Instrument2['Color'].fillna(0)
compass_df_Instrument2['Value'] = compass_df_Instrument2['Value']/100

compass_df_Instrument2 = compass_df_Instrument2.drop(['Code', 'CountryLabel', 'Variable', 'isbudgeted'], axis=1).copy()
compass_df_Instrument2.rename(columns={'CountryLabel2': 'CountryLabel'}, inplace=True)


# Generate the graph:

import plotly.graph_objects as go

Value = compass_df_Instrument2["Value"]

fig = go.Figure(data=[go.Scatter(
    x= compass_df_Instrument2["axislocation"],
    y= compass_df_Instrument2["CountryLabel"],
    text=compass_df_Instrument2["Value"].map(lambda n: '{:,.2%}'.format(n)),
    texttemplate = "{0:.2f}%",
    mode='markers',
    marker=dict(
        size=Value,
        line=dict(width=2,color='DarkSlateGrey'),
        sizemode='area',
        sizeref=2.*max(Value)/(40.**2),
        sizemin=0,
        cmax=1,
        cmin=0,
        color=compass_df_Instrument2["Color"],
        colorscale='YlOrRd',
        showscale=True,
        colorbar=dict(
            title="Instrument funding data:",
            titleside="top",
            tickmode="array",
            tickvals=[0.1, 0.5, 0.9],
            ticktext=["None reported", "Reported for half <br>of instruments", "All reported"],
            tickfont=dict(family="verdana", color='#2a3f5f', size=10),
            title_font_family="verdana",
            title_font_size=13,
            title_font_color='#2a3f5f')
    )
)])

fig.update_layout(
    title={
        'text': "Figure 10: Selection criteria for project grants for public research focuses <br> as a share of total instruments reported (budget-weighted)",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis=dict(
        showgrid=False,
        showline=True,
        linecolor='rgb(102, 102, 102)',
        tickfont_color='rgb(102, 102, 102)',
        showticklabels=True,
        dtick=0.5,
        tickformat=".0%",
        ticks='outside',
        tickcolor='rgb(102, 102, 102)',
    ),
    margin=dict(l=140, r=40, b=50, t=100),
    legend=dict(
        font_size=10,
        yanchor='middle',
        xanchor='left',
    ),
    width=800,
    height=550,
    paper_bgcolor='white',
    plot_bgcolor='white',
    hovermode='closest',
)

fig.update_yaxes(
    tickfont=dict(family="verdana", color='#2a3f5f', size=13),
    ticks="inside",
    tickson="boundaries",
    tickcolor="#444",
    gridwidth=1,
    gridcolor="#444",
    showgrid=True
)

fig.update_xaxes(
    title_text='Selection criteria',
    title_font_family="verdana",
    title_font_size=17,
    title_font_color='#2a3f5f',
    fixedrange=True,
    range=(min(My_position["axislocation"])-1, max(My_position["axislocation"]) +1 ),
    ticktext=My_position['Long Description'],
    tickvals=My_position["axislocation"],
    showgrid=False,
    gridwidth=1,
    gridcolor="#444",
    ticks="outside",
    tickson="boundaries",
    tickcolor="#444",
    tickfont=dict(family="verdana", color='#2a3f5f', size=13)
)
fig.show()  

Figure 10 indicates that scientific and societal impact anticipated are the two main criteria used by grants within the five selected countries. Alignment with national research priorities, track record of applicant and commercial impact anticipated complete the top 5 criteria among the five countries. All the other criteria are much less represented in comparison. Germany, Canada and Korea have a more diversified set of selection criteria compared to France and Japan. In France, the alignment with national research priorities is more significant compared to other countries, whereas in Japan the participation of early-career researchers stands out as a criteria that is relatively more frequently used.

4.3. Dedicated support to research infrastructures

4.3.1. Objectives

The last instrument addressed in this analysis concerns dedicated support to research infrastructures. This includes support for the creation of new facilities, resources and services used by the science community to conduct research and foster innovation. Figure 11 represents the prevalence of the objectives cited by these such instruments, across the five selected countries. Following the same approach as for previous charts, the data is aggregated by country using the NBW coefficients to calculate the proportion of each objective within the total pool of instruments. More than one objective can be targeted by instruments.

Show Code
# Data preparation:
    # Copy the central dataframe
compass_df_Instrument = compass_df.copy()

    # Filter on initiatives linked to "Project grants for public research"
compass_df_Instrument = compass_df_Instrument[(compass_df_Instrument.InstrumentTypeLabel == "Dedicated support to research infrastructures")]

    # Aggregate data by InstrumentID
compass_df_Instrument = pd.pivot_table(data=compass_df_Instrument, index=['CountryLabel', 'OECD_STATUS', 'CountryCode', 'InstrumentID'], values=['NBW', 'isbudgeted', 'F120:V856', 'F120:V857', 'F120:V858', 'F120:V859', 'F120:V860', 'F120:V861', 'F120:V862'], aggfunc={'NBW': pd.Series.mean, 'F120:V856': sum, 'F120:V857': sum, 'F120:V858': sum, 'F120:V859': sum, 'F120:V860': sum, 'F120:V861': sum, 'F120:V862': sum, 'isbudgeted': pd.Series.mean}, margins=True, margins_name='total', fill_value=0).reset_index()

    # Add an extra column to account for missing values:
compass_df_Instrument["F120:V863"] = 0

for x in range(5, compass_df_Instrument.shape[1]-3):
    compass_df_Instrument["F120:V863"] = compass_df_Instrument["F120:V863"] + compass_df_Instrument.iloc[:,x]

for x in range(0, compass_df_Instrument.shape[0]):
    if compass_df_Instrument["F120:V863"].loc[x] != 0:
        compass_df_Instrument["F120:V863"].loc[x]  = 0
    else:
        compass_df_Instrument["F120:V863"].loc[x]  = 1
        # Note: we've added "F120:V863" to the dictionary earlier

    # Reorder the columns to have variables on the right:
compass_df_Instrument = compass_df_Instrument.reindex(columns=['OECD_STATUS', 'CountryLabel', 'CountryCode', 'InstrumentID', 'NBW', 'isbudgeted', 'F120:V856', 'F120:V857', 'F120:V858', 'F120:V859', 'F120:V860', 'F120:V861', 'F120:V862', 'F120:V863'])
compass_df_Instrument['isbudgeted856'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F120:V856']
compass_df_Instrument['isbudgeted857'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F120:V857']
compass_df_Instrument['isbudgeted858'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F120:V858']
compass_df_Instrument['isbudgeted859'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F120:V859']
compass_df_Instrument['isbudgeted860'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F120:V860']
compass_df_Instrument['isbudgeted861'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F120:V861']
compass_df_Instrument['isbudgeted862'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F120:V862']
compass_df_Instrument['isbudgeted863'] = compass_df_Instrument['isbudgeted'] * compass_df_Instrument['F120:V863']

compass_df_Instrument['total856'] = compass_df_Instrument['F120:V856']
compass_df_Instrument['total857'] = compass_df_Instrument['F120:V857']
compass_df_Instrument['total858'] = compass_df_Instrument['F120:V858']
compass_df_Instrument['total859'] = compass_df_Instrument['F120:V859']
compass_df_Instrument['total860'] = compass_df_Instrument['F120:V860']
compass_df_Instrument['total861'] = compass_df_Instrument['F120:V861']
compass_df_Instrument['total862'] = compass_df_Instrument['F120:V862']
compass_df_Instrument['total863'] = compass_df_Instrument['F120:V863']

# Weight each data by NBW
for x in range(6, compass_df_Instrument.shape[1]-16):
    compass_df_Instrument.iloc[:,x] = compass_df_Instrument.iloc[:,x] * compass_df_Instrument['NBW']

# Aggregate the data by countries
compass_df_Instrument = pd.pivot_table(data=compass_df_Instrument, index=['CountryLabel', 'OECD_STATUS', 'CountryCode'], values=['isbudgeted', 'InstrumentID', 'F120:V856', 'F120:V857', 'F120:V858', 'F120:V859', 'F120:V860', 'F120:V861', 'F120:V862', 'F120:V863', 'isbudgeted856', 'isbudgeted857', 'isbudgeted858', 'isbudgeted859', 'isbudgeted860', 'isbudgeted861', 'isbudgeted862', 'isbudgeted863', 'total856', 'total857', 'total858', 'total859', 'total860', 'total861', 'total862', 'total863'], aggfunc={'isbudgeted': sum, 'InstrumentID':pd.Series.nunique, 'F120:V856': sum, 'F120:V857': sum, 'F120:V858': sum, 'F120:V859': sum, 'F120:V860': sum, 'F120:V861': sum, 'F120:V862': sum, 'F120:V863': sum, 'isbudgeted856': sum, 'isbudgeted857': sum, 'isbudgeted858': sum, 'isbudgeted859': sum, 'isbudgeted860': sum, 'isbudgeted861': sum, 'isbudgeted862': sum, 'isbudgeted863': sum, 'total856': sum, 'total857': sum, 'total858': sum, 'total859': sum, 'total860': sum, 'total861': sum, 'total862': sum, 'total863': sum}, margins=True, margins_name='total2', fill_value=0).reset_index()
index_names = compass_df_Instrument[(compass_df_Instrument['OECD_STATUS'] == '')].index
compass_df_Instrument.drop(index_names, inplace = True)

compass_df_Instrument['isbudgeted856'] = compass_df_Instrument['isbudgeted856'] / compass_df_Instrument['total856']
compass_df_Instrument['isbudgeted857'] = compass_df_Instrument['isbudgeted857'] / compass_df_Instrument['total857']
compass_df_Instrument['isbudgeted858'] = compass_df_Instrument['isbudgeted858'] / compass_df_Instrument['total858']
compass_df_Instrument['isbudgeted859'] = compass_df_Instrument['isbudgeted859'] / compass_df_Instrument['total859']
compass_df_Instrument['isbudgeted860'] = compass_df_Instrument['isbudgeted860'] / compass_df_Instrument['total860']
compass_df_Instrument['isbudgeted861'] = compass_df_Instrument['isbudgeted861'] / compass_df_Instrument['total861']
compass_df_Instrument['isbudgeted862'] = compass_df_Instrument['isbudgeted862'] / compass_df_Instrument['total862']
compass_df_Instrument['isbudgeted863'] = compass_df_Instrument['isbudgeted863'] / compass_df_Instrument['total863']

# Get rid of intermediate totals (not needed any more)
compass_df_Instrument = compass_df_Instrument.drop(['total856', 'total857', 'total858', 'total859', 'total860', 'total861', 'total862', 'total863'], axis=1).copy()

# Generate a dataframe filtered on OECD member only:
compass_df_Instrument_bis = compass_df_Instrument[compass_df_Instrument.OECD_STATUS == "Member"]
new_row = {'CountryLabel':'OECD Mean', 'OECD_STATUS': 'Member', 'CountryCode':'OECD2', 'F120:V856': round(compass_df_Instrument_bis['F120:V856'].mean(),1), 'F120:V857': round(compass_df_Instrument_bis['F120:V857'].mean(),1), 'F120:V858': round(compass_df_Instrument_bis['F120:V858'].mean(),1), 'F120:V859': round(compass_df_Instrument_bis['F120:V859'].mean(),1), 'F120:V860': compass_df_Instrument_bis['F120:V860'].mean(), 'F120:V861': compass_df_Instrument_bis['F120:V861'].mean(), 'F120:V862': compass_df_Instrument_bis['F120:V862'].mean(), 'F120:V863': compass_df_Instrument_bis['F120:V863'].mean(), 
           'isbudgeted': compass_df_Instrument_bis['isbudgeted'].mean(), 'isbudgeted856': compass_df_Instrument_bis['isbudgeted856'].mean(), 'isbudgeted857': compass_df_Instrument_bis['isbudgeted857'].mean(), 'isbudgeted858': compass_df_Instrument_bis['isbudgeted858'].mean(), 'isbudgeted859': compass_df_Instrument_bis['isbudgeted859'].mean(), 'isbudgeted860': compass_df_Instrument_bis['isbudgeted860'].mean(), 'isbudgeted861': compass_df_Instrument_bis['isbudgeted861'].mean(), 'isbudgeted862': compass_df_Instrument_bis['isbudgeted862'].mean(), 'isbudgeted863': compass_df_Instrument_bis['isbudgeted863'].mean(), 
           'InstrumentID': compass_df_Instrument_bis['InstrumentID'].mean()}
new_row2 = {'CountryLabel':'OECD Median', 'OECD_STATUS': 'Member', 'CountryCode':'OECD', 'F120:V856': round(compass_df_Instrument_bis['F120:V856'].median(),1), 'F120:V857': round(compass_df_Instrument_bis['F120:V857'].median(),1), 'F120:V858': round(compass_df_Instrument_bis['F120:V858'].median(),1), 'F120:V859': round(compass_df_Instrument_bis['F120:V859'].median(),1), 'F120:V860': compass_df_Instrument_bis['F120:V860'].median(), 'F120:V861': compass_df_Instrument_bis['F120:V861'].median(), 'F120:V862': compass_df_Instrument_bis['F120:V862'].median(), 'F120:V863': compass_df_Instrument_bis['F120:V863'].median(), 
           'isbudgeted': compass_df_Instrument_bis['isbudgeted'].mean(), 'isbudgeted856': compass_df_Instrument_bis['isbudgeted856'].median(), 'isbudgeted857': compass_df_Instrument_bis['isbudgeted857'].median(), 'isbudgeted858': compass_df_Instrument_bis['isbudgeted858'].median(), 'isbudgeted859': compass_df_Instrument_bis['isbudgeted859'].median(), 'isbudgeted860': compass_df_Instrument_bis['isbudgeted860'].median(), 'isbudgeted861': compass_df_Instrument_bis['isbudgeted861'].median(), 'isbudgeted862': compass_df_Instrument_bis['isbudgeted862'].median(), 'isbudgeted863': compass_df_Instrument_bis['isbudgeted863'].median(), 
           'InstrumentID': compass_df_Instrument_bis['InstrumentID'].median()}


# Append the OECD data to the dataframes
compass_df_Instrument = compass_df_Instrument.append(new_row, ignore_index=True)

# Generate new labels dataframe
compass_df_Instrument_Label = compass_df_Instrument[['CountryLabel', 'isbudgeted', 'InstrumentID']].copy()
compass_df_Instrument_Label['CountryLabel2'] = compass_df_Instrument_Label['CountryLabel']
compass_df_Instrument_Label['isbudgeted'] = compass_df_Instrument_Label['InstrumentID'] * compass_df_Instrument_Label['isbudgeted']
for x in range(0, compass_df_Instrument_Label.shape[0]):
    if compass_df_Instrument_Label["CountryLabel"].loc[x] == "OECD Mean":
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " (" + (round(compass_df_Instrument_Label['InstrumentID'].loc[x],1).astype(float)).astype(str) + ") "
    else:
        compass_df_Instrument_Label['CountryLabel2'].loc[x] = compass_df_Instrument_Label['CountryLabel'].loc[x] + " ("  + (compass_df_Instrument_Label['InstrumentID'].loc[x].astype(int)).astype(str) + ") "     


# Prepare the dataframe for graph inclusion:
# Reorder the columns to have variables on the right:
compass_df_Instrument = compass_df_Instrument.reindex(columns=['OECD_STATUS', 'CountryLabel', 'CountryCode', 'InstrumentID', 'isbudgeted', 'isbudgeted856', 'isbudgeted857', 'isbudgeted858', 'isbudgeted859', 'isbudgeted860', 'isbudgeted861', 'isbudgeted862','isbudgeted863','F120:V856', 'F120:V857', 'F120:V858', 'F120:V859', 'F120:V860', 'F120:V861', 'F120:V862', 'F120:V863'])

# The dataframe obtained shows counts of initiatives for each time horizon - Compute the proportion instead
compass_df_Instrument["F120:V864"] = 0
for x in range(13, compass_df_Instrument.shape[1]-1):
    compass_df_Instrument["F120:V864"] = compass_df_Instrument["F120:V864"] + compass_df_Instrument.iloc[:,x]
for x in range(13, compass_df_Instrument.shape[1]):
    compass_df_Instrument.iloc[:,x] = compass_df_Instrument.iloc[:,x] / compass_df_Instrument["F120:V864"] 

    # Select the countries to analyse
compass_df_Instrument2 = compass_df_Instrument.loc[compass_df_Instrument["CountryCode"].isin(list(my_countries["CountryCode"]))]

    # Trim the dataframe to only core data and transpose it
compass_df_Instrument2 = compass_df_Instrument2[['CountryLabel', 'CountryCode', 'InstrumentID', 'isbudgeted', 'isbudgeted856', 'isbudgeted857', 'isbudgeted858', 'isbudgeted859', 'isbudgeted860', 'isbudgeted861', 'isbudgeted862','isbudgeted863','F120:V856', 'F120:V857', 'F120:V858', 'F120:V859', 'F120:V860', 'F120:V861', 'F120:V862', 'F120:V863']].copy()
 
    # rename the InstrumentID to total
compass_df_Instrument2 = compass_df_Instrument2.rename(columns={"InstrumentID":"Total"})

    # Aggregate data by countries in two distinct dataframe
compass_df_Instrument2_isbudgeted = pd.pivot_table(data=compass_df_Instrument2, index=['CountryLabel', 'CountryCode'], values=['isbudgeted856', 'isbudgeted857', 'isbudgeted858', 'isbudgeted859', 'isbudgeted860', 'isbudgeted861', 'isbudgeted862', 'isbudgeted863'], aggfunc={'isbudgeted856': sum, 'isbudgeted857': sum, 'isbudgeted858': sum, 'isbudgeted859': sum, 'isbudgeted860': sum, 'isbudgeted861': sum, 'isbudgeted862': sum, 'isbudgeted863': sum}, fill_value=0).reset_index()
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.sort_values(by='CountryLabel', ascending=True)

compass_df_Instrument2 = pd.pivot_table(data=compass_df_Instrument2, index=['CountryLabel', 'CountryCode'], values=['Total', 'isbudgeted', 'F120:V856', 'F120:V857', 'F120:V858', 'F120:V859', 'F120:V860', 'F120:V861', 'F120:V862', 'F120:V863'], aggfunc={'Total': sum, 'isbudgeted': pd.Series.mean, 'F120:V856': sum, 'F120:V857': sum, 'F120:V858': sum, 'F120:V859': sum, 'F120:V860': sum, 'F120:V861': sum, 'F120:V862': sum, 'F120:V863': sum}, fill_value=0).reset_index()
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='CountryLabel', ascending=True)
compass_df_Instrument2['Total'] = compass_df_Instrument2['F120:V856'] + compass_df_Instrument2['F120:V857'] + compass_df_Instrument2['F120:V858'] + compass_df_Instrument2['F120:V859'] + compass_df_Instrument2['F120:V860'] + compass_df_Instrument2['F120:V861'] + compass_df_Instrument2['F120:V862'] + compass_df_Instrument2['F120:V863']

# Prepare the dataframe for graph inclusion:

    # Due to lack of data, some countries must be removed: remove COUNTRY < 3 initiatives:

index_names = compass_df_Instrument2[(compass_df_Instrument2['isbudgeted']) < 3 ].index
compass_df_Instrument2 = compass_df_Instrument2.drop(index_names, inplace = False)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(index_names, inplace = False)

    # Obtain the data in percent instead:
for x in range(2, compass_df_Instrument2.shape[1]-2):
       compass_df_Instrument2.iloc[:,x] = 100* compass_df_Instrument2.iloc[:,x]/ compass_df_Instrument2['Total']

# Rank the factors:
    # Remove the CountryCode:OECD (to exclude it from TOP computation) and store them in another dataframe (we'll need them again)
compass_df_Instrument2_filtered_OECD = compass_df_Instrument2.iloc[compass_df_Instrument2.shape[0]-1:,:]
index_names = compass_df_Instrument2[(compass_df_Instrument2['CountryCode'] == 'OECD') | (compass_df_Instrument2['CountryCode'] == 'OECD2')].index
compass_df_Instrument2.drop(index_names, inplace = True)

list_mean_order = {'CountryLabel':'TEST', 'F120:V856': compass_df_Instrument2['F120:V856'].mean(), 'F120:V857': compass_df_Instrument2['F120:V857'].mean(), 'F120:V858': compass_df_Instrument2['F120:V858'].mean(), 'F120:V859': compass_df_Instrument2['F120:V859'].mean(), 'F120:V860': compass_df_Instrument2['F120:V860'].mean(), 
                   'F120:V861': compass_df_Instrument2['F120:V861'].mean(), 'F120:V862': compass_df_Instrument2['F120:V862'].mean(), 'F120:V863': compass_df_Instrument2['F120:V863'].mean()}

list_mean_order_isbudgeted = {'CountryLabel':'TEST', 'isbudgeted856': compass_df_Instrument2['F120:V856'].mean(), 'isbudgeted857': compass_df_Instrument2['F120:V857'].mean(), 'isbudgeted858': compass_df_Instrument2['F120:V858'].mean(), 'isbudgeted859': compass_df_Instrument2['F120:V859'].mean(), 'isbudgeted860': compass_df_Instrument2['F120:V860'].mean(), 
                   'isbudgeted861': compass_df_Instrument2['F120:V861'].mean(), 'isbudgeted862': compass_df_Instrument2['F120:V862'].mean(), 'isbudgeted863': compass_df_Instrument2['F120:V863'].mean()}


compass_df_Instrument2 = compass_df_Instrument2.append(list_mean_order, ignore_index=True)
compass_df_Instrument2 = compass_df_Instrument2.append(compass_df_Instrument2_filtered_OECD, ignore_index=True)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.append(list_mean_order_isbudgeted, ignore_index=True)

compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['CountryCode'], axis=1).copy()
compass_df_Instrument2 = compass_df_Instrument2.drop(['CountryCode', 'isbudgeted', 'Total'], axis=1).copy()


# sort both dataframes
    # sort table compass_df_Instrument2_isbudgeted
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.T
new_header = compass_df_Instrument2_isbudgeted.iloc[0]
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted[1:]
compass_df_Instrument2_isbudgeted.columns = new_header
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.astype(float)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.sort_values(by='TEST', ascending=False).reset_index(inplace = False)
compass_df_Instrument2_isbudgeted = compass_df_Instrument2_isbudgeted.drop(['TEST'], axis=1).copy()

    # sort table compass_df_Instrument2
compass_df_Instrument2 = compass_df_Instrument2.T
new_header = compass_df_Instrument2.iloc[0]
compass_df_Instrument2 = compass_df_Instrument2[1:]
compass_df_Instrument2.columns = new_header
compass_df_Instrument2 = compass_df_Instrument2.astype(float)
compass_df_Instrument2 = compass_df_Instrument2.sort_values(by='TEST', ascending=False).reset_index(inplace = False)
compass_df_Instrument2 = compass_df_Instrument2.drop(['TEST'], axis=1).copy()

# use the index to create a variable called axislocation
My_position = compass_df_Instrument2.copy()
My_position = My_position.reset_index()
My_position = My_position[["level_0", "index"]]
My_position.rename(columns={'level_0': 'axislocation'}, inplace=True)

# melt the two dataframes
compass_df_Instrument3 = compass_df_Instrument2_isbudgeted.copy()
new_header3 = list(compass_df_Instrument2_isbudgeted.columns[1:].values)
compass_df_Instrument3 = compass_df_Instrument3.melt(id_vars=["index"], value_vars=new_header3, var_name='Variable', value_name='Color')
compass_df_Instrument4 = compass_df_Instrument2.copy()
new_header4 = list(compass_df_Instrument4.columns[1:].values)
compass_df_Instrument4 = compass_df_Instrument4.melt(id_vars=["index"], value_vars=new_header4, var_name='Variable', value_name='Value')

compass_df_Instrument2 = compass_df_Instrument4.copy()
compass_df_Instrument2["Color"] = compass_df_Instrument3["Color"]

    # add the axislocation to compass_df_Instrument2

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      My_position, 
                      on ='index', 
                      how ='inner')

    # Rename the columns from Dict_Headers name

compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                      Dict_Headers, 
                      left_on=['index'],
                      right_on=['Code'],
                      how ='inner')    
My_position = pd.merge(My_position, 
                      Dict_Headers, 
                      left_on=['index'],
                      right_on=['Code'],
                      how ='inner')
compass_df_Instrument2 = pd.merge(compass_df_Instrument2, 
                        compass_df_Instrument_Label, 
                        left_on ='Variable', 
                        right_on=['CountryLabel'],
                        how ='inner')
compass_df_Instrument2['Color'] = compass_df_Instrument2['Color'].fillna(0)
compass_df_Instrument2['Value'] = compass_df_Instrument2['Value']/100

compass_df_Instrument2 = compass_df_Instrument2.drop(['Code', 'CountryLabel', 'Variable', 'isbudgeted'], axis=1).copy()
compass_df_Instrument2.rename(columns={'CountryLabel2': 'CountryLabel'}, inplace=True)

# Generate the graph:

import plotly.graph_objects as go

Value = compass_df_Instrument2["Value"]

fig = go.Figure(data=[go.Scatter(
    x= compass_df_Instrument2["axislocation"],
    y= compass_df_Instrument2["CountryLabel"],
    text=compass_df_Instrument2["Value"].map(lambda n: '{:,.2%}'.format(n)),
    texttemplate = "{0:.2f}%",
    mode='markers',
    marker=dict(
        size=Value,
        line=dict(width=2,color='DarkSlateGrey'),
        sizemode='area',
        sizeref=2.*max(Value)/(40.**2),
        sizemin=0,
        cmax=1,
        cmin=0,
        color=compass_df_Instrument2["Color"],
        colorscale='YlOrRd',
        showscale=True,
        colorbar=dict(
            title="Instrument funding data:",
            titleside="top",
            tickmode="array",
            tickvals=[0.1, 0.5, 0.9],
            ticktext=["None reported", "Reported for half <br>of instruments", "All reported"],
            tickfont=dict(family="verdana", color='#2a3f5f', size=10),
            title_font_family="verdana",
            title_font_size=13,
            title_font_color='#2a3f5f')
    )
)])

fig.update_layout(
    title={
        'text': "Figure 11: Objectives in dedicated support for research infrastructures <br> as a share of total instruments reported (budget-weighted)",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis=dict(
        showgrid=False,
        showline=True,
        linecolor='rgb(102, 102, 102)',
        tickfont_color='rgb(102, 102, 102)',
        showticklabels=True,
        dtick=0.5,
        tickformat=".0%",
        ticks='outside',
        tickcolor='rgb(102, 102, 102)',
    ),
    margin=dict(l=140, r=40, b=50, t=100),
    legend=dict(
        font_size=10,
        yanchor='middle',
        xanchor='left',
    ),
    width=800,
    height=550,
    paper_bgcolor='white',
    plot_bgcolor='white',
    hovermode='closest',
)

fig.update_yaxes(
    tickfont=dict(family="verdana", color='#2a3f5f', size=13),
    ticks="inside",
    tickson="boundaries",
    tickcolor="#444",
    gridwidth=1,
    gridcolor="#444",
    showgrid=True
)

fig.update_xaxes(
    title_text='Objectives reported',
    title_font_family="verdana",
    title_font_size=17,
    title_font_color='#2a3f5f',
    fixedrange=True,
    range=(min(My_position["axislocation"])-1, max(My_position["axislocation"]) +1 ),
    ticktext=My_position['Long Description'], 
    tickvals=My_position["axislocation"],
    showgrid=False,
    gridwidth=1,
    gridcolor="#444",
    ticks="outside",
    tickson="boundaries",
    tickcolor="#444",
    tickfont=dict(family="verdana", color='#2a3f5f', size=13)
)
fig.show()

Compared to other instruments analysed in this notebook, their sample size is much smaller, i.e. with about five instruments on average in OECD countries. Moreover, in many instances no objective has been reported. This raises questions on the comprehensiveness of the data and whether it is sufficient for interpreting the data for the five selected countries. With regards to the OECD mean, alignment with national research priorities, support to societal or environmental challenges and science-industry collaboration are the main objectives.

5. Outlook

The STIP Compass dataset is rich and potentially lends itself to cross-country comparison. The visualisations produced in this notebook are meant to be interpreted as indications of trends rather than precise measures, due to the qualitative nature of the data. The various policy indicators built in this notebook are normalised against the data reported by each country, to take into account different levels of reporting and make comparisons possible. Such indicators could benefit from being better placed within a broader national context, e.g. using statistical indicators and other kinds of evidence.

The main challenge in this type of analysis is that the levels of granularity in the data reported vary from country to country. The raw data needs to be inspected, in consultation with government officials and academics working in the field of public research, to determine whether it is representative of ongoing national policies. The visualisations in this notebook could also be used to this end, as country experts should be able to point out inconsistencies in the patterns emerging from the data.

This notebook is made available online as a proof of concept, to offer country policy analysts tools to reproduce and explore benchmarking national policies using the STIP Compass dataset.